Re: [HACKERS] Changeset Extraction v7.6.1

2014-06-01 Thread Euler Taveira
On 01-06-2014 02:57, Andres Freund wrote:
 On 2014-06-01 00:50:58 -0500, Jim Nasby wrote:
 On 5/31/14, 9:11 AM, Andres Freund wrote:
 On 2014-02-21 15:14:15 -0600, Jim Nasby wrote:
 On 2/17/14, 7:31 PM, Robert Haas wrote:
 But do you really want to keep that snapshot around long enough to
 copy the entire database?  I bet you don't: if the database is big,
 holding back xmin for long enough to copy the whole thing isn't likely
 to be fun.

 I can confirm that this would be epic fail, at least for londiste. It 
 takes about 3 weeks for a new copy of a ~2TB database. There's no way 
 that'd work with one snapshot. (Granted, copy performance in londiste is 
 rather lackluster, but still...)

 I'd marked this email as todo:
 If you have such a huge database you can, with logical decoding at
 least, use a basebackup using pg_basebackup or pg_start/stop_backup()
 and roll forwards from that... That'll hopefull make such huge copies
 much faster.
 
 Just keep in mind that one of the use cases for logical replication is 
 upgrades.
 
 Should still be fine. Make a physical copy; pg_upgrade; catchup via
 logical rep.
 
Have in mind that it is not an option if you want to copy *part* of the
database(s) (unless you have space available and want to do the cleanup
after upgrade). In a near future, a (new) tool could do (a) copy schema,
(b) accumulate modifications while copying data, (c) copy whole table
and (d) apply modifications for selected table(s)/schema(s). Such a tool
could even be an alternative to pg_upgrade.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-06-01 Thread Jim Nasby

On 6/1/14, 10:49 AM, Euler Taveira wrote:

On 01-06-2014 02:57, Andres Freund wrote:

On 2014-06-01 00:50:58 -0500, Jim Nasby wrote:

On 5/31/14, 9:11 AM, Andres Freund wrote:

On 2014-02-21 15:14:15 -0600, Jim Nasby wrote:

On 2/17/14, 7:31 PM, Robert Haas wrote:

But do you really want to keep that snapshot around long enough to
copy the entire database?  I bet you don't: if the database is big,
holding back xmin for long enough to copy the whole thing isn't likely
to be fun.


I can confirm that this would be epic fail, at least for londiste. It takes 
about 3 weeks for a new copy of a ~2TB database. There's no way that'd work 
with one snapshot. (Granted, copy performance in londiste is rather lackluster, 
but still...)


I'd marked this email as todo:
If you have such a huge database you can, with logical decoding at
least, use a basebackup using pg_basebackup or pg_start/stop_backup()
and roll forwards from that... That'll hopefull make such huge copies
much faster.



Just keep in mind that one of the use cases for logical replication is upgrades.


Should still be fine. Make a physical copy; pg_upgrade; catchup via
logical rep.


Have in mind that it is not an option if you want to copy *part* of the
database(s) (unless you have space available and want to do the cleanup
after upgrade). In a near future, a (new) tool could do (a) copy schema,
(b) accumulate modifications while copying data, (c) copy whole table
and (d) apply modifications for selected table(s)/schema(s). Such a tool
could even be an alternative to pg_upgrade.


There's also things that pg_upgrade doesn't handle, so it's not always an 
option.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-05-31 Thread Andres Freund
On 2014-02-21 15:14:15 -0600, Jim Nasby wrote:
 On 2/17/14, 7:31 PM, Robert Haas wrote:
 But do you really want to keep that snapshot around long enough to
 copy the entire database?  I bet you don't: if the database is big,
 holding back xmin for long enough to copy the whole thing isn't likely
 to be fun.
 
 I can confirm that this would be epic fail, at least for londiste. It takes 
 about 3 weeks for a new copy of a ~2TB database. There's no way that'd work 
 with one snapshot. (Granted, copy performance in londiste is rather 
 lackluster, but still...)

I'd marked this email as todo:
If you have such a huge database you can, with logical decoding at
least, use a basebackup using pg_basebackup or pg_start/stop_backup()
and roll forwards from that... That'll hopefull make such huge copies
much faster.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-05-31 Thread Jim Nasby

On 5/31/14, 9:11 AM, Andres Freund wrote:

On 2014-02-21 15:14:15 -0600, Jim Nasby wrote:

On 2/17/14, 7:31 PM, Robert Haas wrote:

But do you really want to keep that snapshot around long enough to
copy the entire database?  I bet you don't: if the database is big,
holding back xmin for long enough to copy the whole thing isn't likely
to be fun.


I can confirm that this would be epic fail, at least for londiste. It takes 
about 3 weeks for a new copy of a ~2TB database. There's no way that'd work 
with one snapshot. (Granted, copy performance in londiste is rather lackluster, 
but still...)


I'd marked this email as todo:
If you have such a huge database you can, with logical decoding at
least, use a basebackup using pg_basebackup or pg_start/stop_backup()
and roll forwards from that... That'll hopefull make such huge copies
much faster.


Just keep in mind that one of the use cases for logical replication is upgrades.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-05-31 Thread Andres Freund
On 2014-06-01 00:50:58 -0500, Jim Nasby wrote:
 On 5/31/14, 9:11 AM, Andres Freund wrote:
 On 2014-02-21 15:14:15 -0600, Jim Nasby wrote:
 On 2/17/14, 7:31 PM, Robert Haas wrote:
 But do you really want to keep that snapshot around long enough to
 copy the entire database?  I bet you don't: if the database is big,
 holding back xmin for long enough to copy the whole thing isn't likely
 to be fun.
 
 I can confirm that this would be epic fail, at least for londiste. It takes 
 about 3 weeks for a new copy of a ~2TB database. There's no way that'd work 
 with one snapshot. (Granted, copy performance in londiste is rather 
 lackluster, but still...)
 
 I'd marked this email as todo:
 If you have such a huge database you can, with logical decoding at
 least, use a basebackup using pg_basebackup or pg_start/stop_backup()
 and roll forwards from that... That'll hopefull make such huge copies
 much faster.

 Just keep in mind that one of the use cases for logical replication is 
 upgrades.

Should still be fine. Make a physical copy; pg_upgrade; catchup via
logical rep.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-27 Thread Andres Freund
On 2014-02-24 12:50:03 -0500, Robert Haas wrote:
 On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
  On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
 
  +   /*
  +* XXX: It's impolite to ignore our argument and keep decoding 
  until the
  +* current position.
  +*/
 
  Eh, what?
 
  So, the background here is that I was thinking of allowing to specify a
  limit for the number of returned rows. For the sql interface that sounds
  like a good idea. I am just not so sure anymore that allowing to specify
  a LSN as a limit is sufficient. Maybe simply allow to limit the number
  of changes and check everytime a transaction has been replayed?
 
 The last idea there seems like pretty sound, but ...
 
  It's all trivial codewise, I am just wondering about the interface most
  users would want.
 
 ...I can't swear it meets this criterion.

So, it's now:
CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
IN slotname name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC 
options text[] DEFAULT '{}',
OUT location pg_lsn, OUT xid xid, OUT data text)
RETURNS SETOF RECORD
LANGUAGE INTERNAL
VOLATILE ROWS 1000 COST 1000
AS 'pg_logical_slot_get_changes';

if nonnull upto_lsn allows limiting based on the lsn, similar with
upto_nchanges.

Makes sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-27 Thread Robert Haas
On Thu, Feb 27, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-24 12:50:03 -0500, Robert Haas wrote:
 On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
  On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
 
  +   /*
  +* XXX: It's impolite to ignore our argument and keep decoding 
  until the
  +* current position.
  +*/
 
  Eh, what?
 
  So, the background here is that I was thinking of allowing to specify a
  limit for the number of returned rows. For the sql interface that sounds
  like a good idea. I am just not so sure anymore that allowing to specify
  a LSN as a limit is sufficient. Maybe simply allow to limit the number
  of changes and check everytime a transaction has been replayed?

 The last idea there seems like pretty sound, but ...

  It's all trivial codewise, I am just wondering about the interface most
  users would want.

 ...I can't swear it meets this criterion.

 So, it's now:
 CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes(
 IN slotname name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC 
 options text[] DEFAULT '{}',
 OUT location pg_lsn, OUT xid xid, OUT data text)
 RETURNS SETOF RECORD
 LANGUAGE INTERNAL
 VOLATILE ROWS 1000 COST 1000
 AS 'pg_logical_slot_get_changes';

 if nonnull upto_lsn allows limiting based on the lsn, similar with
 upto_nchanges.

 Makes sense?

Time will tell, but it seems plausible to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-24 Thread Andres Freund
On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote:

 +   /*
 +* XXX: It's impolite to ignore our argument and keep decoding until 
 the
 +* current position.
 +*/
 
 Eh, what?

So, the background here is that I was thinking of allowing to specify a
limit for the number of returned rows. For the sql interface that sounds
like a good idea. I am just not so sure anymore that allowing to specify
a LSN as a limit is sufficient. Maybe simply allow to limit the number
of changes and check everytime a transaction has been replayed?

It's all trivial codewise, I am just wondering about the interface most
users would want.

 +* We misuse the original meaning of SnapshotData's xip and
 subxip fields
 +* to make the more fitting for our needs.
 [...]
 +* XXX: Do we want extra fields instead of misusing existing
 ones instead?
 
 If we're going to do this, then it surely needs to be documented in
 snapshot.h.  On the second question, you're not the first hacker to
 want to abuse the meanings of the existing fields; SnapshotDirty
 already does it.  It's tempting to think we need a more principled
 approach to this, like what we've done with Node i.e. typedef enum ...
 SnapshotType; and then a separate struct definition for each kind, all
 beginning with SnapshotType type.

Hm, essentially that's what the -satisfies pointer already is, right?

There's already documentation of the extra fields in snapbuild, but I
understand you'd rather have them moved?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-24 Thread Robert Haas
On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com 
 wrote:

 +   /*
 +* XXX: It's impolite to ignore our argument and keep decoding until 
 the
 +* current position.
 +*/

 Eh, what?

 So, the background here is that I was thinking of allowing to specify a
 limit for the number of returned rows. For the sql interface that sounds
 like a good idea. I am just not so sure anymore that allowing to specify
 a LSN as a limit is sufficient. Maybe simply allow to limit the number
 of changes and check everytime a transaction has been replayed?

The last idea there seems like pretty sound, but ...

 It's all trivial codewise, I am just wondering about the interface most
 users would want.

...I can't swear it meets this criterion.

 +* We misuse the original meaning of SnapshotData's xip and
 subxip fields
 +* to make the more fitting for our needs.
 [...]
 +* XXX: Do we want extra fields instead of misusing existing
 ones instead?

 If we're going to do this, then it surely needs to be documented in
 snapshot.h.  On the second question, you're not the first hacker to
 want to abuse the meanings of the existing fields; SnapshotDirty
 already does it.  It's tempting to think we need a more principled
 approach to this, like what we've done with Node i.e. typedef enum ...
 SnapshotType; and then a separate struct definition for each kind, all
 beginning with SnapshotType type.

 Hm, essentially that's what the -satisfies pointer already is, right?

Sorta, yeah.  But with nodes, you can change the whole struct
definition for each type.

 There's already documentation of the extra fields in snapbuild, but I
 understand you'd rather have them moved?

Yeah, I think it needs to be documented where SnapshotData is defined.
 There may be reason to mention it again, or in more detail,
elsewhere.  But there should be some mention of it there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Andres Freund
Hi,

On 2014-02-19 13:01:02 -0500, Robert Haas wrote:
  I think it should be fairly easy to relax the restriction to creating a
  slot, but not getting data from it. Do you think that would that be
  sufficient?
 
 That would be a big improvement, for sure, and might be entirely sufficient.

Turned out to be a 5 line change + tests or something... Pushed.

  I don't think this is a very good idea.  The problem with doing things
  during error recovery that can themselves fail is that you'll lose the
  original error, which is not cool, and maybe even blow out the error
  stack.  Many people have confuse PG_TRY()/PG_CATCH() with an
  exception-handling system, but it's not.  One way to fix this is to
  put some of the initialization logic in ReplicationSlotCreate() just
  prior to calling CreateSlotOnDisk().  If the work that needs to be
  done is too complex or protracted to be done there, then I think that
  it should be pulled out of the act of creating the replication slot
  and made to happen as part of first use, or as a separate operation
  like PrepareForLogicalDecoding.
 
  I think what should be done here is adding a drop_on_release flag. As
  soon as everything important is done, it gets unset.
 
 That might be more elegant, but I don't think it really fixes
 anything, because backing stuff out from on disk can fail.

If the slot is marked as drop_on_release during creation, and we fail
during removal, it will just be dropped on the next startup. That seems
ok to me?

I still think it's not really important to put much effort in the disk
stuff fails case, it's entirely hypothetical. If that fails you have
*so* much huger problems, a leftover slot is the least of you problems.

 AIUI, your
 whole concern here is that you don't want the slot creation to fail
 halfway through and leave behind the slot, but what you've got here
 doesn't prevent that; it just makes it less likely. The more I think
 about it, the more I think you're trying to pack stuff into slot
 creation that really ought to be happening on first use.

Well, having a leftover slot that never succeeded being created is going
to be confusing lots of people, especially as it will not rollback or
something. That's why I think it's important to make it unlikely.
The typical reasons for failing are stuff like a output plugin that
doesn't exist or being interrupted while initializing.

I can sympathize with the too much during init argument, but I don't
see how moving stuff to the first call would get rid of the problems. If
we fail later it's going to be just as confusing.

  ReorderBufferGetTXN() should get a comment about the performance
  impact of this.  There's a tiny bit there in ReorderBufferReturnTXN()
  but it should be better called out.  Should these call the valgrind
  macros to make the memory inaccessible while it's being held in cache?
 
  Hm, I think it does call the valgrind stuff?
  VALGRIND_MAKE_MEM_UNDEFINED(txn, sizeof(ReorderBufferTXN));
  VALGRIND_MAKE_MEM_DEFINED(txn-node, sizeof(txn-node));
 
 That's there in ReorderBufferReturnTXN, but don't you need something
 in ReorderBufferGetTXN?  Maybe not.

Don't think so, it marks the memory as undefined, which allows writes,
but will warn on reads. We could additionally mark the memory as
inaccessible disallowing writes, but I don't really that catching much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Andres Freund
On 2014-02-19 13:31:06 -0500, Robert Haas wrote:
 TBH, as compared to what you've got now, I think this mostly boils
 down to a question of quoting and escaping.  I'm not really concerned
 with whether we ship something that's perfectly efficient, or that has
 filtering capabilities, or that has a lot of fancy bells and whistles.
  What I *am* concerned about is that if the user updates a text field
 that contains characters like  or ' or : or [ or ] or , that somebody
 might be using as delimiters in the output format, that a program can
 still parse that output format and reliably determine what the actual
 change was.  I don't care all that much whether we use JSON or CSV or
 something custom, but the data that gets spit out should not have
 SQL-injection-like vulnerabilities.

If it's just that, I am *perfectly* happy to change it. What I do not
want is arguments like I don't want the type information, that's
pointless because it's actually really important for regression
testing.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Andres Freund
On 2014-02-19 13:07:11 -0500, Robert Haas wrote:
 On Tue, Feb 18, 2014 at 4:07 AM, Andres Freund and...@2ndquadrant.com wrote:
  2. I think the snapshot-export code is fundamentally misdesigned.  As
  I said before, the idea that we're going to export one single snapshot
  at one particular point in time strikes me as extremely short-sighted.
 
  I don't think so. It's precisely what you need to implement a simple
  replication solution. Yes, there are usecases that could benefit from
  more possibilities, but that's always the case.
 
   For example, consider one-to-many replication where clients may join
  or depart the replication group at any time.  Whenever somebody joins,
  we just want a snapshot, LSN pair such that they can apply all
  changes after the LSN except for XIDs that would have been visible to
  the snapshot.
 
  And? They need to create individual replication slots, which each
  will get a snapshot.
 
 So we have to wait for startup N times, and transmit the change stream
 N times, instead of once?  Blech.

I can't get too excited about this. If we later want to add a command to
clone an existing slot, sure, that's perfectly fine with me. That will
then stream at exactly the same position. Easy, less than 20 LOC + docs
probably.

We have much more waiting e.g. in the CONCURRENTLY commands and it's not
causing that many problems.

Note that it'd be a *significant* overhead to contiuously be able to
export snapshots that are useful for looking at normal relations. Bot
for computing snapshots and for not being able to remove those rows.

  And in fact, we don't even need any special machinery
  for that; the client can just make a connection and *take a snapshot*
  once decoding is initialized enough.
 
  No, they can't. Two reasons: For one the commit order between snapshots
  and WAL isn't necessarily the same.

 So what?

So you can't just use a plain snapshot and dump using it, without
getting into inconsistencies.

  For another, clients now need logic
  to detect whether a transaction's contents has already been applied or
  has not been applied yet, that's nontrivial.

 My point is, I think we should be trying to *make* that trivial,
 rather than doing this.

I think *this* *is* making it trivial.

Maybe I've missed it, but I haven't seen any alternative that comes even
*close* to being as easy to implement in a replication
solution. Currently you can use it like:

CREATE_REPLICATION_SLOT name LOGICAL
copy data using the exported snapshot
START_REPLICATION SLOT name LOGICAL
stream changes.

Where you can do the START_REPLICATION as soon as some other sesion has
imported the snapshot. Really not much to worry about additionally.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Robert Haas
On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 I can sympathize with the too much during init argument, but I don't
 see how moving stuff to the first call would get rid of the problems. If
 we fail later it's going to be just as confusing.

No, it isn't.  If you fail during init the use will expect the slot to
be gone.  That's the reason for all of this complexity.  If you fail
on first use, the user will expect the slot to still be there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Andres Freund
On 2014-02-21 08:16:59 -0500, Robert Haas wrote:
 On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote:
  I can sympathize with the too much during init argument, but I don't
  see how moving stuff to the first call would get rid of the problems. If
  we fail later it's going to be just as confusing.
 
 No, it isn't.  If you fail during init the use will expect the slot to
 be gone.  That's the reason for all of this complexity.  If you fail
 on first use, the user will expect the slot to still be there.

The primary case for failing is a plugin that either doesn't exist or
fails to initialize, or a user aborting the init. It seems odd that a
created slot fails because of a bad plugin or needs to wait till it
finds a suitable snapshot record. We could add an intermediary call like
pg_startup_logical_slot() but that doesn't seem to have much going for
it?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Robert Haas
On Fri, Feb 21, 2014 at 8:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-21 08:16:59 -0500, Robert Haas wrote:
 On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  I can sympathize with the too much during init argument, but I don't
  see how moving stuff to the first call would get rid of the problems. If
  we fail later it's going to be just as confusing.

 No, it isn't.  If you fail during init the use will expect the slot to
 be gone.  That's the reason for all of this complexity.  If you fail
 on first use, the user will expect the slot to still be there.

 The primary case for failing is a plugin that either doesn't exist or
 fails to initialize, or a user aborting the init. It seems odd that a
 created slot fails because of a bad plugin or needs to wait till it
 finds a suitable snapshot record. We could add an intermediary call like
 pg_startup_logical_slot() but that doesn't seem to have much going for
 it?

Well, we can surely detect a plugin that fails to initialize before
creating the slot on disk, right?

I'm not sure what fails to initialize entails.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Andres Freund
On 2014-02-21 08:51:03 -0500, Robert Haas wrote:
 On Fri, Feb 21, 2014 at 8:27 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-21 08:16:59 -0500, Robert Haas wrote:
  On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   I can sympathize with the too much during init argument, but I don't
   see how moving stuff to the first call would get rid of the problems. If
   we fail later it's going to be just as confusing.
 
  No, it isn't.  If you fail during init the use will expect the slot to
  be gone.  That's the reason for all of this complexity.  If you fail
  on first use, the user will expect the slot to still be there.
 
  The primary case for failing is a plugin that either doesn't exist or
  fails to initialize, or a user aborting the init. It seems odd that a
  created slot fails because of a bad plugin or needs to wait till it
  finds a suitable snapshot record. We could add an intermediary call like
  pg_startup_logical_slot() but that doesn't seem to have much going for
  it?
 
 Well, we can surely detect a plugin that fails to initialize before
 creating the slot on disk, right?

We could detect whether the plugin .so can be loaded and provides the
required callbacks, but we can't initialize it.

 I'm not sure what fails to initialize entails.

elog(ERROR, 'hey, the tables I require are missing');

or similar.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-21 Thread Jim Nasby

On 2/17/14, 7:31 PM, Robert Haas wrote:

But do you really want to keep that snapshot around long enough to
copy the entire database?  I bet you don't: if the database is big,
holding back xmin for long enough to copy the whole thing isn't likely
to be fun.


I can confirm that this would be epic fail, at least for londiste. It takes 
about 3 weeks for a new copy of a ~2TB database. There's no way that'd work 
with one snapshot. (Granted, copy performance in londiste is rather lackluster, 
but still...)
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-19 Thread Robert Haas
On Sat, Feb 15, 2014 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  [ new patches ]

 0001 already needs minor

 Hm?

 If there are conflicts, I'll push/send a rebased tomorrow or monday.

As you guessed, the missing word was rebasing.  It's a trivial
conflict though, so please don't feel the need to repost just for
that.

 +* the transaction's invalidations. This currently won't be needed if
 +* we're just skipping over the transaction, since that currently 
 only
 +* happens when starting decoding, after we invalidated all caches, 
 but
 +* transactions in other databases might have touched shared 
 relations.

 I'm having trouble understanding what this means, especially the part
 after the but.

 Let me rephrase, maybe that will help:

 This currently won't be needed if we're just skipping over the
 transaction because currenlty we only do so during startup, to get to
 the first transaction the client needs. As we have reset the catalog
 caches before starting to read WAL and we haven't yet touched any
 catalogs there can't be anything to invalidate.

 But if we're forgetting this commit because it's it happened in
 another database, the invalidations might be important, because they
 could be for shared catalogs and we might have loaded data into the
 relevant syscaches.

 Better?

Much!  Please include that text, or something like it.

 +   if (IsTransactionState() 
 +   GetTopTransactionIdIfAny() != InvalidTransactionId)
 +   ereport(ERROR,
 +   (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
 +errmsg(cannot perform changeset
 extraction in transaction that has performed writes)));

 This is sort of an awkward restriction, as it makes it hard to compose
 this feature with others.  What underlies the restriction, can we get
 rid of it, and if not can we include a comment here explaining why it
 exists?

 Well, you can write the result into a table if you're halfway
 careful... :)

 I think it should be fairly easy to relax the restriction to creating a
 slot, but not getting data from it. Do you think that would that be
 sufficient?

That would be a big improvement, for sure, and might be entirely sufficient.

 +   /* register output plugin name with slot */
 +   strncpy(NameStr(slot-data.plugin), plugin,
 +   NAMEDATALEN);
 +   NameStr(slot-data.plugin)[NAMEDATALEN - 1] = '\0';

 If it's safe to do this without a lock, I don't know why.

 Well, the worst that could happen is that somebody else doing a SELECT *
 FROM pg_replication_slot gets a incomplete plugin name... But we
 certainly can hold the spinlock during it if you think that's better.

Isn't the worst thing that can happen that they copy garbage out of
the buffer, because the new name is longer than the old and only
partially written?

 More broadly, I wonder why this is_init stuff is in here at all.
 Maybe the stuff that happens in the is_init case should be done in the
 caller, or another helper function.

 The reason I moved it in there is that after the walsender patch there
 are two callers and the stuff is sufficiently complex that I having it
 twice lead to bugs.
 The reason it's currenlty the same function is that sufficiently much of
 the code would have to be shared that I found it it ugly to split. I'll
 have a look whether I can figure something out.

I was thinking that the is_init portion could perhaps be done first,
in a *previous* function call, and adjusted in such a way that the
non-is-init path can be used for both case here.

 I don't think this is a very good idea.  The problem with doing things
 during error recovery that can themselves fail is that you'll lose the
 original error, which is not cool, and maybe even blow out the error
 stack.  Many people have confuse PG_TRY()/PG_CATCH() with an
 exception-handling system, but it's not.  One way to fix this is to
 put some of the initialization logic in ReplicationSlotCreate() just
 prior to calling CreateSlotOnDisk().  If the work that needs to be
 done is too complex or protracted to be done there, then I think that
 it should be pulled out of the act of creating the replication slot
 and made to happen as part of first use, or as a separate operation
 like PrepareForLogicalDecoding.

 I think what should be done here is adding a drop_on_release flag. As
 soon as everything important is done, it gets unset.

That might be more elegant, but I don't think it really fixes
anything, because backing stuff out from on disk can fail.  AIUI, your
whole concern here is that you don't want the slot creation to fail
halfway through and leave behind the slot, but what you've got here
doesn't prevent that; it just makes it less likely.  The more I think
about it, 

Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-19 Thread Robert Haas
On Sun, Feb 16, 2014 at 12:12 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  [ new patches ]

 0001 already needs minor

 + * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?

 Yes, please.  If you can submit a separate patch creating this file
 and relocating this stuff there, I will commit it.

 I started to work on that, but I am not sure we actually need it
 anymore. tuptoaster.h isn't included in that many places, so perhaps we
 should just add it there?

That seems fine to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-19 Thread Robert Haas
On Tue, Feb 18, 2014 at 4:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 2. I think the snapshot-export code is fundamentally misdesigned.  As
 I said before, the idea that we're going to export one single snapshot
 at one particular point in time strikes me as extremely short-sighted.

 I don't think so. It's precisely what you need to implement a simple
 replication solution. Yes, there are usecases that could benefit from
 more possibilities, but that's always the case.

  For example, consider one-to-many replication where clients may join
 or depart the replication group at any time.  Whenever somebody joins,
 we just want a snapshot, LSN pair such that they can apply all
 changes after the LSN except for XIDs that would have been visible to
 the snapshot.

 And? They need to create individual replication slots, which each will
 get a snapshot.

So we have to wait for startup N times, and transmit the change stream
N times, instead of once?  Blech.

 And in fact, we don't even need any special machinery
 for that; the client can just make a connection and *take a snapshot*
 once decoding is initialized enough.

 No, they can't. Two reasons: For one the commit order between snapshots
 and WAL isn't necessarily the same.

So what?

 For another, clients now need logic
 to detect whether a transaction's contents has already been applied or
 has not been applied yet, that's nontrivial.

My point is, I think we should be trying to *make* that trivial,
rather than doing this.

 3. As this feature is proposed, the only plugin we'll ship with 9.4 is
 a test_decoding plugin which, as its own documentation says, doesn't
 do anything especially useful.  What exactly do we gain by forcing
 users who want to make use of these new capabilities to write C code?

 It gains us to have a output plugin in which we can easily demonstrate
 features so they can be tested in the regression tests. Which I find to
 be rather important.
 Just like e.g. the test_shm_mq stuff doesn't do anything really useful.

It definitely doesn't, but this patch is a lot closer to being done
than parallel query is, so I'm not sure it's a fair comparison.

 The test_decoding plugin doesn't seem tremendously
 much simpler than something that someone could actually use, so why
 not make that the goal?

 For one, it being a designated toy plugin allows us to easily change it,
 to showcase/test new features. For another, I still don't agree that
 it's easy to agree to an output format. I think we should include some
 that matured into 9.5.

I regret that more effort has not been made in that area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-19 Thread Robert Haas
On Tue, Feb 18, 2014 at 4:33 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-17 21:35:23 -0500, Robert Haas wrote:
 What
 I don't understand is why we're not taking the test_decoding module,
 polishing it up a little to produce some nice, easily
 machine-parseable output, calling it basic_decoding, and shipping
 that.  Then people who want something else can build it, but people
 who are happy with something basic will already have it.

 Because every project is going to need their own plugin
 *anyway*. Londiste, slony sure are going to ignore changes to relations
 they don't need. Querying their own metadata. They will want
 compatibility to the earlier formats as far as possible. Sometime not
 too far away they will want to optionally support binary output because
 it's so much faster.
 There's just not much chance that either of these will be able to agree
 on a format short term.

Ah, so part of what you're expecting the output plugin to do is
filtering.  I can certainly see where there might be considerable
variation between solutions in that area - but I think that's separate
from the question of formatting per se.  Although I think we should
have an in-core output plugin with filtering capabilities eventually,
I'm happy to define that as out of scope for 9.4.  But isn't there a
way that we can ship something that will due for people who want to
just see the database's entire change stream float by?

TBH, as compared to what you've got now, I think this mostly boils
down to a question of quoting and escaping.  I'm not really concerned
with whether we ship something that's perfectly efficient, or that has
filtering capabilities, or that has a lot of fancy bells and whistles.
 What I *am* concerned about is that if the user updates a text field
that contains characters like  or ' or : or [ or ] or , that somebody
might be using as delimiters in the output format, that a program can
still parse that output format and reliably determine what the actual
change was.  I don't care all that much whether we use JSON or CSV or
something custom, but the data that gets spit out should not have
SQL-injection-like vulnerabilities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-19 Thread Euler Taveira
On 18-02-2014 06:33, Andres Freund wrote:
 I really hope there will be nicer ones by the time 9.4 is
 released. Euler did send in a json plugin
 http://archives.postgresql.org/message-id/52A5BFAE.1040209%2540timbira.com.br
 , but there hasn't too much feedback yet. It's hard to start discussing
 something that needs a couple of patches to pg before you can develop
 your own patch...
 
BTW, I've updated that code to reflect the recent changes in the API and
publish it in [1]. This version is based on the Andres' branch
xlog-decoding-rebasing-remapping. I'll continue to polish this code.


Regards,


[1] https://github.com/eulerto/wal2json


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-18 Thread Andres Freund
Hi Robert,

On 2014-02-17 20:31:34 -0500, Robert Haas wrote:
 1. How safe is it to try to do decoding inside of a regular backend?
 What we're doing here is entering a special mode where we forbid the
 use of regular snapshots in favor of requiring the use of decoding
 snapshots, and forbid access to non-catalog relations.  We then run
 through the decoding process; and then exit back into regular mode.
 On entering and on exiting this special mode, we
 InvalidateSystemCaches().  I don't see a big problem with having
 special backends (e.g. walsender) use this special mode, but I'm less
 convinced that it's wise to try to set things up so that we can switch
 back and forth between decoding mode and regular mode in a single
 backend.

The main reason the SQL interface exists is that it's awfully hard to
use isolationtester, pg_regress et al when the output isn't also visible
via SQL. We tried hacking things in other ways, but that's what it came
down to. If you recall, previously the SQL changes interface was only in
a test_logical_decoding extension, because I wasn't sure it's all that
interesting for real usecases.
It's sure nice for testing things though.

 I worry that won't end up working out very cleanly, and I
 think the prohibition against using this special mode in an
 XID-bearing transaction is merely a small downpayment on future pain
 in this area.

That restriction is in principle only needed when creating the slot, not
when getting changes. The only problem is that some piece of code
doesn't know about it.

The reason it exists are twofold: One is that when looking for an
initial snapshot, we wait for concurrent transactions to end. If we'd
wait for the transaction itself we'd be in trouble, it could never
happen. The second reason is that the code do a XactLockTableWait() to
visualize it's waiting, so isolatester knows it should background the
command. It's not good to wait on itself.
But neither is actually needed when not creating the slot, the code just
needs to be told about that.

 That having been said, I can't pretend at this point
 either to understand the genesis of this particular restriction or
 what other problems are likely to crop up in trying to allow this
 mode-switching.  So it's possible that I'm overblowing it, but it's
 makin' me nervous.

I am not terribly concerned, but I can understand where you are coming
from. I think for replication solutions this isn't going to be needed
but it's way much more handy for testing and such.

 2. I think the snapshot-export code is fundamentally misdesigned.  As
 I said before, the idea that we're going to export one single snapshot
 at one particular point in time strikes me as extremely short-sighted.

I don't think so. It's precisely what you need to implement a simple
replication solution. Yes, there are usecases that could benefit from
more possibilities, but that's always the case.

  For example, consider one-to-many replication where clients may join
 or depart the replication group at any time.  Whenever somebody joins,
 we just want a snapshot, LSN pair such that they can apply all
 changes after the LSN except for XIDs that would have been visible to
 the snapshot.

And? They need to create individual replication slots, which each will
get a snapshot.

 And in fact, we don't even need any special machinery
 for that; the client can just make a connection and *take a snapshot*
 once decoding is initialized enough.

No, they can't. Two reasons: For one the commit order between snapshots
and WAL isn't necessarily the same. For another, clients now need logic
to detect whether a transaction's contents has already been applied or
has not been applied yet, that's nontrivial.

 This code is going to great
 pains to be able to export a snapshot at the precise point when all
 transactions that were running in the first xl_running_xacts record
 seen after the start of decoding have ended, but there's nothing
 magical about that point, except that it's the first point at which a
 freshly-taken snapshot is guaranteed to be good enough to establish an
 initial state for any table in the database.

I still maintain that there's something magic about that moment. It's
when all *future* (from the POV of the snapshot) changes will be
streamed, and all *past* changes are included in the exported snapshot.

 But do you really want to keep that snapshot around long enough to
 copy the entire database?  I bet you don't: if the database is big,
 holding back xmin for long enough to copy the whole thing isn't likely
 to be fun.

Well, that's how pg_dump works, it's not this patch's problem to fix
that.

 You might well want to copy one table at a time, with
 progressively newer snapshots, and apply to each table only those
 transactions that weren't part of the initial snapshot for that table.
  Many other patterns are possible.  What you've got baked in here
 right now is suitable only for the simplest imaginable case, and yet
 we're paying a 

Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-18 Thread Andres Freund
On 2014-02-17 21:10:26 -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  1. How safe is it to try to do decoding inside of a regular backend?
  What we're doing here is entering a special mode where we forbid the
  use of regular snapshots in favor of requiring the use of decoding
  snapshots, and forbid access to non-catalog relations.  We then run
  through the decoding process; and then exit back into regular mode.
  On entering and on exiting this special mode, we
  InvalidateSystemCaches().
 
 How often is such a mode switch expected to happen?  I would expect
 frequent use of InvalidateSystemCaches() to be pretty much disastrous
 for performance, even absent any of the possible bugs you're worried
 about.  It would likely be better to design things so that a decoder
 backend does only that.

Very infrequently. When it's starting to decode, and when it's
ending. When used via walsender, that should only happen at connection
start/end which surely shouldn't be frequent.
It's more frequent when using the SQL interface, but since that's not a
streaming interface on a busy server there still would be a couple of
megabytes of transactions to decode for one reset.

  3. As this feature is proposed, the only plugin we'll ship with 9.4 is
  a test_decoding plugin which, as its own documentation says, doesn't
  do anything especially useful.  What exactly do we gain by forcing
  users who want to make use of these new capabilities to write C code?
 
 TBH, if that's all we're going to ship, I'm going to vote against
 committing this patch to 9.4 at all.  Let it wait till 9.5 when we
 might be able to build something useful on it.

There *are* useful things around already. We didn't include postgres_fdw
in the same release as the fdw code either? I don't see why this should
be held to a different standard.

 To point out just
 one obvious problem, how much confidence can we have in the APIs
 being right if there are no usable clients?

Because there *are* clients. They just don't sound likely to either be
suitable for core code (to specialized) or have already been submitted
(the json plugin).

There's a whole replication suite built ontop of this, to a good degree
to just test it. So I am fairly confident that the most important parts
are covered. There sure is additional features I want, but that's not
surprising.

 The most recent precedent I can think of is the FDW APIs, which I'd
 be the first to admit are still in flux.  But we didn't ship anything
 there without non-toy contrib modules to exercise it.  If we had,
 we'd certainly have regretted it, because in the creation of those
 contrib modules we found flaws in the initial design.

Which non-toy fdw was there? file_fdw was in 9.1, but that's a toy. And
*8.4* had CREATE FOREIGN DATA WRAPPER, without it doing anything...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-18 Thread Andres Freund
On 2014-02-17 18:49:59 -0800, Peter Geoghegan wrote:
 On Mon, Feb 17, 2014 at 6:35 PM, Robert Haas robertmh...@gmail.com wrote:
  What I actually suspect is going to happen if we ship this as-is is
  that people are going to start building logical replication solutions
  on top of the test_decoding module even though it explicitly says that
  it's just test code.  This is *really* cool technology and people are
  *hungry* for it.  But writing C is hard, so if there's not a polished
  plugin available, I bet people are going to try to use the
  not-polished one.  I think we try to get out ahead of that.
 
 Tom made a comparison with FDWs, so I'll make another. The Multicorn
 module made FDW authorship much more accessible by wrapping it in a
 Python interface, I believe with some success. I don't want to stand
 in the way of building a fully-featured test_decoding module, but I
 think that those that would misuse test_decoding as it currently
 stands can be redirected to a third-party wrapper. As you say, it's
 pretty cool stuff, so it seems likely that someone will build one for
 us.

Absolutely. I *sure* hope somebody is going to build such an
abstraction. I am not entirely sure how it'd look like, but ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-18 Thread Andres Freund
On 2014-02-17 21:35:23 -0500, Robert Haas wrote:
 What
 I don't understand is why we're not taking the test_decoding module,
 polishing it up a little to produce some nice, easily
 machine-parseable output, calling it basic_decoding, and shipping
 that.  Then people who want something else can build it, but people
 who are happy with something basic will already have it.

Because every project is going to need their own plugin
*anyway*. Londiste, slony sure are going to ignore changes to relations
they don't need. Querying their own metadata. They will want
compatibility to the earlier formats as far as possible. Sometime not
too far away they will want to optionally support binary output because
it's so much faster.
There's just not much chance that either of these will be able to agree
on a format short term.

So, possibly we could agree to something that consumers *outside* of
replication could use.

 What I actually suspect is going to happen if we ship this as-is is
 that people are going to start building logical replication solutions
 on top of the test_decoding module even though it explicitly says that
 it's just test code.  This is *really* cool technology and people are
 *hungry* for it.  But writing C is hard, so if there's not a polished
 plugin available, I bet people are going to try to use the
 not-polished one.  I think we try to get out ahead of that.

I really hope there will be nicer ones by the time 9.4 is
released. Euler did send in a json plugin
http://archives.postgresql.org/message-id/52A5BFAE.1040209%2540timbira.com.br
, but there hasn't too much feedback yet. It's hard to start discussing
something that needs a couple of patches to pg before you can develop
your own patch...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-17 Thread Robert Haas
On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote:
 [ patches ]

Having now had a little bit of opportunity to reflect on the State Of
This Patch, I'd like to step back from the minutia upon which I've
been commenting in my previous emails and articulate three high-level
concerns about this patch.  In so doing, I would like to specifically
request that other folks on this mailing list comment on the extent to
which they do or do not believe these concerns to be valid.  I believe
I've mentioned all of these concerns at least to some degree
previously, but they've been mixed in with other things, so I want to
take this opportunity to call them out more clearly.

1. How safe is it to try to do decoding inside of a regular backend?
What we're doing here is entering a special mode where we forbid the
use of regular snapshots in favor of requiring the use of decoding
snapshots, and forbid access to non-catalog relations.  We then run
through the decoding process; and then exit back into regular mode.
On entering and on exiting this special mode, we
InvalidateSystemCaches().  I don't see a big problem with having
special backends (e.g. walsender) use this special mode, but I'm less
convinced that it's wise to try to set things up so that we can switch
back and forth between decoding mode and regular mode in a single
backend.  I worry that won't end up working out very cleanly, and I
think the prohibition against using this special mode in an
XID-bearing transaction is merely a small downpayment on future pain
in this area.  That having been said, I can't pretend at this point
either to understand the genesis of this particular restriction or
what other problems are likely to crop up in trying to allow this
mode-switching.  So it's possible that I'm overblowing it, but it's
makin' me nervous.

2. I think the snapshot-export code is fundamentally misdesigned.  As
I said before, the idea that we're going to export one single snapshot
at one particular point in time strikes me as extremely short-sighted.
 For example, consider one-to-many replication where clients may join
or depart the replication group at any time.  Whenever somebody joins,
we just want a snapshot, LSN pair such that they can apply all
changes after the LSN except for XIDs that would have been visible to
the snapshot.  And in fact, we don't even need any special machinery
for that; the client can just make a connection and *take a snapshot*
once decoding is initialized enough.  This code is going to great
pains to be able to export a snapshot at the precise point when all
transactions that were running in the first xl_running_xacts record
seen after the start of decoding have ended, but there's nothing
magical about that point, except that it's the first point at which a
freshly-taken snapshot is guaranteed to be good enough to establish an
initial state for any table in the database.

But do you really want to keep that snapshot around long enough to
copy the entire database?  I bet you don't: if the database is big,
holding back xmin for long enough to copy the whole thing isn't likely
to be fun.  You might well want to copy one table at a time, with
progressively newer snapshots, and apply to each table only those
transactions that weren't part of the initial snapshot for that table.
 Many other patterns are possible.  What you've got baked in here
right now is suitable only for the simplest imaginable case, and yet
we're paying a substantial price in implementation complexity for it.
Frankly, this code is *ugly*; the fact that SnapBuildExportSnapshot()
needs to start a transaction so that it can push out a snapshot.  I
think that's a pretty awful abuse of the transaction machinery, and
the whole point of it, AFAICS, is to eliminate flexibility that we'd
have with simpler approaches.

3. As this feature is proposed, the only plugin we'll ship with 9.4 is
a test_decoding plugin which, as its own documentation says, doesn't
do anything especially useful.  What exactly do we gain by forcing
users who want to make use of these new capabilities to write C code?
You previously stated that it wasn't possible (or there wasn't time)
to write something generic, but how hard is it, really?  Sure, people
who are hard-core should have the option to write C code, and I'm
happy that they do.  But that shouldn't, IMHO anyway, be a requirement
to use that feature, and I'm having trouble understanding why we're
making it one.  The test_decoding plugin doesn't seem tremendously
much simpler than something that someone could actually use, so why
not make that the goal?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Having now had a little bit of opportunity to reflect on the State Of
 This Patch, I'd like to step back from the minutia upon which I've
 been commenting in my previous emails and articulate three high-level
 concerns about this patch.  In so doing, I would like to specifically
 request that other folks on this mailing list comment on the extent to
 which they do or do not believe these concerns to be valid.
 ...

 1. How safe is it to try to do decoding inside of a regular backend?
 What we're doing here is entering a special mode where we forbid the
 use of regular snapshots in favor of requiring the use of decoding
 snapshots, and forbid access to non-catalog relations.  We then run
 through the decoding process; and then exit back into regular mode.
 On entering and on exiting this special mode, we
 InvalidateSystemCaches().

How often is such a mode switch expected to happen?  I would expect
frequent use of InvalidateSystemCaches() to be pretty much disastrous
for performance, even absent any of the possible bugs you're worried
about.  It would likely be better to design things so that a decoder
backend does only that.

 2. I think the snapshot-export code is fundamentally misdesigned.

Your concerns here sound reasonable, but I can't say I've got any
special insight into it.

 3. As this feature is proposed, the only plugin we'll ship with 9.4 is
 a test_decoding plugin which, as its own documentation says, doesn't
 do anything especially useful.  What exactly do we gain by forcing
 users who want to make use of these new capabilities to write C code?

TBH, if that's all we're going to ship, I'm going to vote against
committing this patch to 9.4 at all.  Let it wait till 9.5 when we
might be able to build something useful on it.  To point out just
one obvious problem, how much confidence can we have in the APIs
being right if there are no usable clients?  Even if they're right,
what benefit do we get from freezing them one release before anything
useful is going to happen?

The most recent precedent I can think of is the FDW APIs, which I'd
be the first to admit are still in flux.  But we didn't ship anything
there without non-toy contrib modules to exercise it.  If we had,
we'd certainly have regretted it, because in the creation of those
contrib modules we found flaws in the initial design.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-17 Thread Robert Haas
On Mon, Feb 17, 2014 at 9:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 3. As this feature is proposed, the only plugin we'll ship with 9.4 is
 a test_decoding plugin which, as its own documentation says, doesn't
 do anything especially useful.  What exactly do we gain by forcing
 users who want to make use of these new capabilities to write C code?

 TBH, if that's all we're going to ship, I'm going to vote against
 committing this patch to 9.4 at all.  Let it wait till 9.5 when we
 might be able to build something useful on it.  To point out just
 one obvious problem, how much confidence can we have in the APIs
 being right if there are no usable clients?  Even if they're right,
 what benefit do we get from freezing them one release before anything
 useful is going to happen?

I actually have a lot of confidence that the APIs are almost entirely
right, except maybe for the snapshot-related stuff and possibly one or
two other minor details.  And I have every confidence that 2ndQuadrant
is going to put out decoding modules that do useful stuff.  I also
assume Slony is going to ship one at some point.  EnterpriseDB's xDB
replication server will need one, so someone at EDB will have to go
write that.  And if Bucardo or Londiste want to use this
infrastructure, they'll need their own, too.  What I don't understand
is why it's cool to make each of those replication solutions bring its
own to the table.  I mean if they want to, so that they can generate
exactly the format they want with no extra overhead, sure, cool.  What
I don't understand is why we're not taking the test_decoding module,
polishing it up a little to produce some nice, easily
machine-parseable output, calling it basic_decoding, and shipping
that.  Then people who want something else can build it, but people
who are happy with something basic will already have it.

What I actually suspect is going to happen if we ship this as-is is
that people are going to start building logical replication solutions
on top of the test_decoding module even though it explicitly says that
it's just test code.  This is *really* cool technology and people are
*hungry* for it.  But writing C is hard, so if there's not a polished
plugin available, I bet people are going to try to use the
not-polished one.  I think we try to get out ahead of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-17 Thread Peter Geoghegan
On Mon, Feb 17, 2014 at 6:35 PM, Robert Haas robertmh...@gmail.com wrote:
 What I actually suspect is going to happen if we ship this as-is is
 that people are going to start building logical replication solutions
 on top of the test_decoding module even though it explicitly says that
 it's just test code.  This is *really* cool technology and people are
 *hungry* for it.  But writing C is hard, so if there's not a polished
 plugin available, I bet people are going to try to use the
 not-polished one.  I think we try to get out ahead of that.

Tom made a comparison with FDWs, so I'll make another. The Multicorn
module made FDW authorship much more accessible by wrapping it in a
Python interface, I believe with some success. I don't want to stand
in the way of building a fully-featured test_decoding module, but I
think that those that would misuse test_decoding as it currently
stands can be redirected to a third-party wrapper. As you say, it's
pretty cool stuff, so it seems likely that someone will build one for
us.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-16 Thread Andres Freund
On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote:
  [ new patches ]
 
 0001 already needs minor
 
 + * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?
 
 Yes, please.  If you can submit a separate patch creating this file
 and relocating this stuff there, I will commit it.

I started to work on that, but I am not sure we actually need it
anymore. tuptoaster.h isn't included in that many places, so perhaps we
should just add it there?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-15 Thread Robert Haas
On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote:
 [ new patches ]

0001 already needs minor

+ * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h?

Yes, please.  If you can submit a separate patch creating this file
and relocating this stuff there, I will commit it.

+   /*
+* XXX: It's impolite to ignore our argument and keep decoding until the
+* current position.
+*/

Eh, what?

+* We misuse the original meaning of SnapshotData's xip and
subxip fields
+* to make the more fitting for our needs.
[...]
+* XXX: Do we want extra fields instead of misusing existing
ones instead?

If we're going to do this, then it surely needs to be documented in
snapshot.h.  On the second question, you're not the first hacker to
want to abuse the meanings of the existing fields; SnapshotDirty
already does it.  It's tempting to think we need a more principled
approach to this, like what we've done with Node i.e. typedef enum ...
SnapshotType; and then a separate struct definition for each kind, all
beginning with SnapshotType type.

+   /*
+* XXX: Timeline handling/naming. Do we need to include the timeline in
+* snapshot's name? Outside of very obscure, user triggered, cases every
+* LSN should correspond to exactly one timeline?
+*/

I can't really comment intelligently on this, so you need to figure it
out.  My off-the-cuff thought is that erring on the side of including
it couldn't hurt anything.

+ * XXX: use hex format for the LSN as well?

Yes, please.

+   /* prepared abort contain a normal
commit abort... */

contains.

+   /*
+* Abort all transactions that we keep
track of that are older
+* than the record's oldestRunningXid.
This is the most
+* convenient spot for doing so since,
in contrast to shutdown
+* or end-of-recovery checkpoints, we
have sufficient
+* knowledge to deal with prepared
transactions here.
+*/

I have no real quibble with this, but I think the comment could be
clarified slightly to state *what* knowledge we have here that we
wouldn't have there.

+   /* only crash recovery/replication needs to care */

I believe I know what you're getting at here, but the next guy might
not.  I suggest: Although these records only exist to serve the needs
of logical decoding, all the work happens as part of crash or archive
recovery, so we don't need to do anything here.

+* Treat HOT update as normal updates, there
is no useful

s/, t/. T/

+* There are cases in which inplace updates
are used without xids
+* assigned (like VACUUM), there are others
(like CREATE INDEX
+* CONCURRENTLY) where an xid is present. If
an xid was assigned

In-place updates can be used either by XID-bearing transactions (e.g.
in CREATE INDEX CONCURRENTLY) or by XID-less transactions (e.g.
VACUUM).  In the former case, ...

+* redundant because the commit will do that
as well, but one
+* we'll support decoding in-progress
relations, this will be

s/one/once/
s/we'll/we/

+   /* we don't care about row level locks for now */
+   case XLOG_HEAP_LOCK:
+   break;

The position of the comment isn't consistent with the comments for the
other WAL record type in this section; that is, it's above rather than
below the case.

+* transaction's contents as the various caches need to always be

I think you should use since or because rather than as here, and
maybe put a comma before it.

+* the transaction's invalidations. This currently won't be needed if
+* we're just skipping over the transaction, since that currently only
+* happens when starting decoding, after we invalidated all caches, but
+* transactions in other databases might have touched shared relations.

I'm having trouble understanding what this means, especially the part
after the but.

+ * Read a HeapTuple as WAL logged by heap_insert, heap_update and
+ * heap_delete, but not by heap_multi_insert into a tuplebuf.

but not by heap_multi_insert needs punctuation both before and
after.  You can just add a comma after, or change it into a
parenthetical phrase.

As the above comments probably make clear, I'm pretty much happy with decode.c.

+   /* TODO: We got to change that someday soon.. */

Two periods.  Maybe We need to change this some day soon. - and then
follow that with a paragraph explaining what roughly what would need
to be done.

+   /* shorter lines... */
+   slot = MyReplicationSlot;
+
+   /* first some sanity checks that are 

Re: [HACKERS] Changeset Extraction v7.6.1

2014-02-15 Thread Andres Freund
On 2014-02-15 17:29:04 -0500, Robert Haas wrote:
 On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote:
  [ new patches ]
 
 0001 already needs minor

Hm?

If there are conflicts, I'll push/send a rebased tomorrow or monday.

 +* the transaction's invalidations. This currently won't be needed if
 +* we're just skipping over the transaction, since that currently only
 +* happens when starting decoding, after we invalidated all caches, 
 but
 +* transactions in other databases might have touched shared 
 relations.
 
 I'm having trouble understanding what this means, especially the part
 after the but.

Let me rephrase, maybe that will help:

This currently won't be needed if we're just skipping over the
transaction because currenlty we only do so during startup, to get to
the first transaction the client needs. As we have reset the catalog
caches before starting to read WAL and we haven't yet touched any
catalogs there can't be anything to invalidate.

But if we're forgetting this commit because it's it happened in
another database, the invalidations might be important, because they
could be for shared catalogs and we might have loaded data into the
relevant syscaches.

Better?

 +   if (IsTransactionState() 
 +   GetTopTransactionIdIfAny() != InvalidTransactionId)
 +   ereport(ERROR,
 +   (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
 +errmsg(cannot perform changeset
 extraction in transaction that has performed writes)));
 
 This is sort of an awkward restriction, as it makes it hard to compose
 this feature with others.  What underlies the restriction, can we get
 rid of it, and if not can we include a comment here explaining why it
 exists?

Well, you can write the result into a table if you're halfway
careful... :)

I think it should be fairly easy to relax the restriction to creating a
slot, but not getting data from it. Do you think that would that be
sufficient?

 +   /* register output plugin name with slot */
 +   strncpy(NameStr(slot-data.plugin), plugin,
 +   NAMEDATALEN);
 +   NameStr(slot-data.plugin)[NAMEDATALEN - 1] = '\0';
 
 If it's safe to do this without a lock, I don't know why.

Well, the worst that could happen is that somebody else doing a SELECT *
FROM pg_replication_slot gets a incomplete plugin name... But we
certainly can hold the spinlock during it if you think that's better.

 More broadly, I wonder why this is_init stuff is in here at all.
 Maybe the stuff that happens in the is_init case should be done in the
 caller, or another helper function.

The reason I moved it in there is that after the walsender patch there
are two callers and the stuff is sufficiently complex that I having it
twice lead to bugs.
The reason it's currenlty the same function is that sufficiently much of
the code would have to be shared that I found it it ugly to split. I'll
have a look whether I can figure something out.


 +   /* prevent WAL removal as fast as possible */
 +   ReplicationSlotsComputeRequiredLSN();
 
 If there's a race here, can't we rejigger the order of operations to
 eliminate it?  Or is that just too hard and not worth it?

Yes, there's a small race which at the very least should be properly
documented.

Hm. Yes, I think we can plug the hole. If the race condition occurs we'd
take slightly longer to startup, which isn't bad. Will fix.

 +begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
 +   state.callback_name = pg_decode_begin_txn;
 +   ctx-callbacks.begin_cb(ctx, txn);
 
 I feel that there's a certain lack of naming consistency between these
 things.  Can we improve that?  (and similarly for parallel cases)
 
 +pg_create_decoding_replication_slot(PG_FUNCTION_ARGS)
 
 I thought we were going to have physical and logical slots, not
 physical and decoding slots.

Ok.

 +   /* make sure we don't end up with an unreleased slot */
 +   PG_TRY();
 +   {
 ...
 +   PG_CATCH();
 +   {
 +   ReplicationSlotRelease();
 +   ReplicationSlotDrop(NameStr(*name));
 +   PG_RE_THROW();
 +   }
 +   PG_END_TRY();
 
 I don't think this is a very good idea.  The problem with doing things
 during error recovery that can themselves fail is that you'll lose the
 original error, which is not cool, and maybe even blow out the error
 stack.  Many people have confuse PG_TRY()/PG_CATCH() with an
 exception-handling system, but it's not.  One way to fix this is to
 put some of the initialization logic in ReplicationSlotCreate() just
 prior to calling CreateSlotOnDisk().  If the work that needs to be
 done is too complex or protracted to be done there, then I think that
 it should be pulled out of the act of creating the replication slot
 and made to happen as part of first use, or as a separate operation