Re: [HACKERS] Changeset Extraction v7.6.1
On 01-06-2014 02:57, Andres Freund wrote: On 2014-06-01 00:50:58 -0500, Jim Nasby wrote: On 5/31/14, 9:11 AM, Andres Freund wrote: On 2014-02-21 15:14:15 -0600, Jim Nasby wrote: On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) I'd marked this email as todo: If you have such a huge database you can, with logical decoding at least, use a basebackup using pg_basebackup or pg_start/stop_backup() and roll forwards from that... That'll hopefull make such huge copies much faster. Just keep in mind that one of the use cases for logical replication is upgrades. Should still be fine. Make a physical copy; pg_upgrade; catchup via logical rep. Have in mind that it is not an option if you want to copy *part* of the database(s) (unless you have space available and want to do the cleanup after upgrade). In a near future, a (new) tool could do (a) copy schema, (b) accumulate modifications while copying data, (c) copy whole table and (d) apply modifications for selected table(s)/schema(s). Such a tool could even be an alternative to pg_upgrade. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 6/1/14, 10:49 AM, Euler Taveira wrote: On 01-06-2014 02:57, Andres Freund wrote: On 2014-06-01 00:50:58 -0500, Jim Nasby wrote: On 5/31/14, 9:11 AM, Andres Freund wrote: On 2014-02-21 15:14:15 -0600, Jim Nasby wrote: On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) I'd marked this email as todo: If you have such a huge database you can, with logical decoding at least, use a basebackup using pg_basebackup or pg_start/stop_backup() and roll forwards from that... That'll hopefull make such huge copies much faster. Just keep in mind that one of the use cases for logical replication is upgrades. Should still be fine. Make a physical copy; pg_upgrade; catchup via logical rep. Have in mind that it is not an option if you want to copy *part* of the database(s) (unless you have space available and want to do the cleanup after upgrade). In a near future, a (new) tool could do (a) copy schema, (b) accumulate modifications while copying data, (c) copy whole table and (d) apply modifications for selected table(s)/schema(s). Such a tool could even be an alternative to pg_upgrade. There's also things that pg_upgrade doesn't handle, so it's not always an option. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-21 15:14:15 -0600, Jim Nasby wrote: On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) I'd marked this email as todo: If you have such a huge database you can, with logical decoding at least, use a basebackup using pg_basebackup or pg_start/stop_backup() and roll forwards from that... That'll hopefull make such huge copies much faster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 5/31/14, 9:11 AM, Andres Freund wrote: On 2014-02-21 15:14:15 -0600, Jim Nasby wrote: On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) I'd marked this email as todo: If you have such a huge database you can, with logical decoding at least, use a basebackup using pg_basebackup or pg_start/stop_backup() and roll forwards from that... That'll hopefull make such huge copies much faster. Just keep in mind that one of the use cases for logical replication is upgrades. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-06-01 00:50:58 -0500, Jim Nasby wrote: On 5/31/14, 9:11 AM, Andres Freund wrote: On 2014-02-21 15:14:15 -0600, Jim Nasby wrote: On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) I'd marked this email as todo: If you have such a huge database you can, with logical decoding at least, use a basebackup using pg_basebackup or pg_start/stop_backup() and roll forwards from that... That'll hopefull make such huge copies much faster. Just keep in mind that one of the use cases for logical replication is upgrades. Should still be fine. Make a physical copy; pg_upgrade; catchup via logical rep. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-24 12:50:03 -0500, Robert Haas wrote: On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: + /* +* XXX: It's impolite to ignore our argument and keep decoding until the +* current position. +*/ Eh, what? So, the background here is that I was thinking of allowing to specify a limit for the number of returned rows. For the sql interface that sounds like a good idea. I am just not so sure anymore that allowing to specify a LSN as a limit is sufficient. Maybe simply allow to limit the number of changes and check everytime a transaction has been replayed? The last idea there seems like pretty sound, but ... It's all trivial codewise, I am just wondering about the interface most users would want. ...I can't swear it meets this criterion. So, it's now: CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes( IN slotname name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC options text[] DEFAULT '{}', OUT location pg_lsn, OUT xid xid, OUT data text) RETURNS SETOF RECORD LANGUAGE INTERNAL VOLATILE ROWS 1000 COST 1000 AS 'pg_logical_slot_get_changes'; if nonnull upto_lsn allows limiting based on the lsn, similar with upto_nchanges. Makes sense? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Thu, Feb 27, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-24 12:50:03 -0500, Robert Haas wrote: On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: + /* +* XXX: It's impolite to ignore our argument and keep decoding until the +* current position. +*/ Eh, what? So, the background here is that I was thinking of allowing to specify a limit for the number of returned rows. For the sql interface that sounds like a good idea. I am just not so sure anymore that allowing to specify a LSN as a limit is sufficient. Maybe simply allow to limit the number of changes and check everytime a transaction has been replayed? The last idea there seems like pretty sound, but ... It's all trivial codewise, I am just wondering about the interface most users would want. ...I can't swear it meets this criterion. So, it's now: CREATE OR REPLACE FUNCTION pg_logical_slot_get_changes( IN slotname name, IN upto_lsn pg_lsn, IN upto_nchanges int, VARIADIC options text[] DEFAULT '{}', OUT location pg_lsn, OUT xid xid, OUT data text) RETURNS SETOF RECORD LANGUAGE INTERNAL VOLATILE ROWS 1000 COST 1000 AS 'pg_logical_slot_get_changes'; if nonnull upto_lsn allows limiting based on the lsn, similar with upto_nchanges. Makes sense? Time will tell, but it seems plausible to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: + /* +* XXX: It's impolite to ignore our argument and keep decoding until the +* current position. +*/ Eh, what? So, the background here is that I was thinking of allowing to specify a limit for the number of returned rows. For the sql interface that sounds like a good idea. I am just not so sure anymore that allowing to specify a LSN as a limit is sufficient. Maybe simply allow to limit the number of changes and check everytime a transaction has been replayed? It's all trivial codewise, I am just wondering about the interface most users would want. +* We misuse the original meaning of SnapshotData's xip and subxip fields +* to make the more fitting for our needs. [...] +* XXX: Do we want extra fields instead of misusing existing ones instead? If we're going to do this, then it surely needs to be documented in snapshot.h. On the second question, you're not the first hacker to want to abuse the meanings of the existing fields; SnapshotDirty already does it. It's tempting to think we need a more principled approach to this, like what we've done with Node i.e. typedef enum ... SnapshotType; and then a separate struct definition for each kind, all beginning with SnapshotType type. Hm, essentially that's what the -satisfies pointer already is, right? There's already documentation of the extra fields in snapbuild, but I understand you'd rather have them moved? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Mon, Feb 24, 2014 at 9:48 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: + /* +* XXX: It's impolite to ignore our argument and keep decoding until the +* current position. +*/ Eh, what? So, the background here is that I was thinking of allowing to specify a limit for the number of returned rows. For the sql interface that sounds like a good idea. I am just not so sure anymore that allowing to specify a LSN as a limit is sufficient. Maybe simply allow to limit the number of changes and check everytime a transaction has been replayed? The last idea there seems like pretty sound, but ... It's all trivial codewise, I am just wondering about the interface most users would want. ...I can't swear it meets this criterion. +* We misuse the original meaning of SnapshotData's xip and subxip fields +* to make the more fitting for our needs. [...] +* XXX: Do we want extra fields instead of misusing existing ones instead? If we're going to do this, then it surely needs to be documented in snapshot.h. On the second question, you're not the first hacker to want to abuse the meanings of the existing fields; SnapshotDirty already does it. It's tempting to think we need a more principled approach to this, like what we've done with Node i.e. typedef enum ... SnapshotType; and then a separate struct definition for each kind, all beginning with SnapshotType type. Hm, essentially that's what the -satisfies pointer already is, right? Sorta, yeah. But with nodes, you can change the whole struct definition for each type. There's already documentation of the extra fields in snapbuild, but I understand you'd rather have them moved? Yeah, I think it needs to be documented where SnapshotData is defined. There may be reason to mention it again, or in more detail, elsewhere. But there should be some mention of it there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
Hi, On 2014-02-19 13:01:02 -0500, Robert Haas wrote: I think it should be fairly easy to relax the restriction to creating a slot, but not getting data from it. Do you think that would that be sufficient? That would be a big improvement, for sure, and might be entirely sufficient. Turned out to be a 5 line change + tests or something... Pushed. I don't think this is a very good idea. The problem with doing things during error recovery that can themselves fail is that you'll lose the original error, which is not cool, and maybe even blow out the error stack. Many people have confuse PG_TRY()/PG_CATCH() with an exception-handling system, but it's not. One way to fix this is to put some of the initialization logic in ReplicationSlotCreate() just prior to calling CreateSlotOnDisk(). If the work that needs to be done is too complex or protracted to be done there, then I think that it should be pulled out of the act of creating the replication slot and made to happen as part of first use, or as a separate operation like PrepareForLogicalDecoding. I think what should be done here is adding a drop_on_release flag. As soon as everything important is done, it gets unset. That might be more elegant, but I don't think it really fixes anything, because backing stuff out from on disk can fail. If the slot is marked as drop_on_release during creation, and we fail during removal, it will just be dropped on the next startup. That seems ok to me? I still think it's not really important to put much effort in the disk stuff fails case, it's entirely hypothetical. If that fails you have *so* much huger problems, a leftover slot is the least of you problems. AIUI, your whole concern here is that you don't want the slot creation to fail halfway through and leave behind the slot, but what you've got here doesn't prevent that; it just makes it less likely. The more I think about it, the more I think you're trying to pack stuff into slot creation that really ought to be happening on first use. Well, having a leftover slot that never succeeded being created is going to be confusing lots of people, especially as it will not rollback or something. That's why I think it's important to make it unlikely. The typical reasons for failing are stuff like a output plugin that doesn't exist or being interrupted while initializing. I can sympathize with the too much during init argument, but I don't see how moving stuff to the first call would get rid of the problems. If we fail later it's going to be just as confusing. ReorderBufferGetTXN() should get a comment about the performance impact of this. There's a tiny bit there in ReorderBufferReturnTXN() but it should be better called out. Should these call the valgrind macros to make the memory inaccessible while it's being held in cache? Hm, I think it does call the valgrind stuff? VALGRIND_MAKE_MEM_UNDEFINED(txn, sizeof(ReorderBufferTXN)); VALGRIND_MAKE_MEM_DEFINED(txn-node, sizeof(txn-node)); That's there in ReorderBufferReturnTXN, but don't you need something in ReorderBufferGetTXN? Maybe not. Don't think so, it marks the memory as undefined, which allows writes, but will warn on reads. We could additionally mark the memory as inaccessible disallowing writes, but I don't really that catching much. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-19 13:31:06 -0500, Robert Haas wrote: TBH, as compared to what you've got now, I think this mostly boils down to a question of quoting and escaping. I'm not really concerned with whether we ship something that's perfectly efficient, or that has filtering capabilities, or that has a lot of fancy bells and whistles. What I *am* concerned about is that if the user updates a text field that contains characters like or ' or : or [ or ] or , that somebody might be using as delimiters in the output format, that a program can still parse that output format and reliably determine what the actual change was. I don't care all that much whether we use JSON or CSV or something custom, but the data that gets spit out should not have SQL-injection-like vulnerabilities. If it's just that, I am *perfectly* happy to change it. What I do not want is arguments like I don't want the type information, that's pointless because it's actually really important for regression testing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-19 13:07:11 -0500, Robert Haas wrote: On Tue, Feb 18, 2014 at 4:07 AM, Andres Freund and...@2ndquadrant.com wrote: 2. I think the snapshot-export code is fundamentally misdesigned. As I said before, the idea that we're going to export one single snapshot at one particular point in time strikes me as extremely short-sighted. I don't think so. It's precisely what you need to implement a simple replication solution. Yes, there are usecases that could benefit from more possibilities, but that's always the case. For example, consider one-to-many replication where clients may join or depart the replication group at any time. Whenever somebody joins, we just want a snapshot, LSN pair such that they can apply all changes after the LSN except for XIDs that would have been visible to the snapshot. And? They need to create individual replication slots, which each will get a snapshot. So we have to wait for startup N times, and transmit the change stream N times, instead of once? Blech. I can't get too excited about this. If we later want to add a command to clone an existing slot, sure, that's perfectly fine with me. That will then stream at exactly the same position. Easy, less than 20 LOC + docs probably. We have much more waiting e.g. in the CONCURRENTLY commands and it's not causing that many problems. Note that it'd be a *significant* overhead to contiuously be able to export snapshots that are useful for looking at normal relations. Bot for computing snapshots and for not being able to remove those rows. And in fact, we don't even need any special machinery for that; the client can just make a connection and *take a snapshot* once decoding is initialized enough. No, they can't. Two reasons: For one the commit order between snapshots and WAL isn't necessarily the same. So what? So you can't just use a plain snapshot and dump using it, without getting into inconsistencies. For another, clients now need logic to detect whether a transaction's contents has already been applied or has not been applied yet, that's nontrivial. My point is, I think we should be trying to *make* that trivial, rather than doing this. I think *this* *is* making it trivial. Maybe I've missed it, but I haven't seen any alternative that comes even *close* to being as easy to implement in a replication solution. Currently you can use it like: CREATE_REPLICATION_SLOT name LOGICAL copy data using the exported snapshot START_REPLICATION SLOT name LOGICAL stream changes. Where you can do the START_REPLICATION as soon as some other sesion has imported the snapshot. Really not much to worry about additionally. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote: I can sympathize with the too much during init argument, but I don't see how moving stuff to the first call would get rid of the problems. If we fail later it's going to be just as confusing. No, it isn't. If you fail during init the use will expect the slot to be gone. That's the reason for all of this complexity. If you fail on first use, the user will expect the slot to still be there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-21 08:16:59 -0500, Robert Haas wrote: On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote: I can sympathize with the too much during init argument, but I don't see how moving stuff to the first call would get rid of the problems. If we fail later it's going to be just as confusing. No, it isn't. If you fail during init the use will expect the slot to be gone. That's the reason for all of this complexity. If you fail on first use, the user will expect the slot to still be there. The primary case for failing is a plugin that either doesn't exist or fails to initialize, or a user aborting the init. It seems odd that a created slot fails because of a bad plugin or needs to wait till it finds a suitable snapshot record. We could add an intermediary call like pg_startup_logical_slot() but that doesn't seem to have much going for it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Fri, Feb 21, 2014 at 8:27 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-21 08:16:59 -0500, Robert Haas wrote: On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote: I can sympathize with the too much during init argument, but I don't see how moving stuff to the first call would get rid of the problems. If we fail later it's going to be just as confusing. No, it isn't. If you fail during init the use will expect the slot to be gone. That's the reason for all of this complexity. If you fail on first use, the user will expect the slot to still be there. The primary case for failing is a plugin that either doesn't exist or fails to initialize, or a user aborting the init. It seems odd that a created slot fails because of a bad plugin or needs to wait till it finds a suitable snapshot record. We could add an intermediary call like pg_startup_logical_slot() but that doesn't seem to have much going for it? Well, we can surely detect a plugin that fails to initialize before creating the slot on disk, right? I'm not sure what fails to initialize entails. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-21 08:51:03 -0500, Robert Haas wrote: On Fri, Feb 21, 2014 at 8:27 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-21 08:16:59 -0500, Robert Haas wrote: On Fri, Feb 21, 2014 at 6:07 AM, Andres Freund and...@2ndquadrant.com wrote: I can sympathize with the too much during init argument, but I don't see how moving stuff to the first call would get rid of the problems. If we fail later it's going to be just as confusing. No, it isn't. If you fail during init the use will expect the slot to be gone. That's the reason for all of this complexity. If you fail on first use, the user will expect the slot to still be there. The primary case for failing is a plugin that either doesn't exist or fails to initialize, or a user aborting the init. It seems odd that a created slot fails because of a bad plugin or needs to wait till it finds a suitable snapshot record. We could add an intermediary call like pg_startup_logical_slot() but that doesn't seem to have much going for it? Well, we can surely detect a plugin that fails to initialize before creating the slot on disk, right? We could detect whether the plugin .so can be loaded and provides the required callbacks, but we can't initialize it. I'm not sure what fails to initialize entails. elog(ERROR, 'hey, the tables I require are missing'); or similar. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2/17/14, 7:31 PM, Robert Haas wrote: But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. I can confirm that this would be epic fail, at least for londiste. It takes about 3 weeks for a new copy of a ~2TB database. There's no way that'd work with one snapshot. (Granted, copy performance in londiste is rather lackluster, but still...) -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Sat, Feb 15, 2014 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ new patches ] 0001 already needs minor Hm? If there are conflicts, I'll push/send a rebased tomorrow or monday. As you guessed, the missing word was rebasing. It's a trivial conflict though, so please don't feel the need to repost just for that. +* the transaction's invalidations. This currently won't be needed if +* we're just skipping over the transaction, since that currently only +* happens when starting decoding, after we invalidated all caches, but +* transactions in other databases might have touched shared relations. I'm having trouble understanding what this means, especially the part after the but. Let me rephrase, maybe that will help: This currently won't be needed if we're just skipping over the transaction because currenlty we only do so during startup, to get to the first transaction the client needs. As we have reset the catalog caches before starting to read WAL and we haven't yet touched any catalogs there can't be anything to invalidate. But if we're forgetting this commit because it's it happened in another database, the invalidations might be important, because they could be for shared catalogs and we might have loaded data into the relevant syscaches. Better? Much! Please include that text, or something like it. + if (IsTransactionState() + GetTopTransactionIdIfAny() != InvalidTransactionId) + ereport(ERROR, + (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION), +errmsg(cannot perform changeset extraction in transaction that has performed writes))); This is sort of an awkward restriction, as it makes it hard to compose this feature with others. What underlies the restriction, can we get rid of it, and if not can we include a comment here explaining why it exists? Well, you can write the result into a table if you're halfway careful... :) I think it should be fairly easy to relax the restriction to creating a slot, but not getting data from it. Do you think that would that be sufficient? That would be a big improvement, for sure, and might be entirely sufficient. + /* register output plugin name with slot */ + strncpy(NameStr(slot-data.plugin), plugin, + NAMEDATALEN); + NameStr(slot-data.plugin)[NAMEDATALEN - 1] = '\0'; If it's safe to do this without a lock, I don't know why. Well, the worst that could happen is that somebody else doing a SELECT * FROM pg_replication_slot gets a incomplete plugin name... But we certainly can hold the spinlock during it if you think that's better. Isn't the worst thing that can happen that they copy garbage out of the buffer, because the new name is longer than the old and only partially written? More broadly, I wonder why this is_init stuff is in here at all. Maybe the stuff that happens in the is_init case should be done in the caller, or another helper function. The reason I moved it in there is that after the walsender patch there are two callers and the stuff is sufficiently complex that I having it twice lead to bugs. The reason it's currenlty the same function is that sufficiently much of the code would have to be shared that I found it it ugly to split. I'll have a look whether I can figure something out. I was thinking that the is_init portion could perhaps be done first, in a *previous* function call, and adjusted in such a way that the non-is-init path can be used for both case here. I don't think this is a very good idea. The problem with doing things during error recovery that can themselves fail is that you'll lose the original error, which is not cool, and maybe even blow out the error stack. Many people have confuse PG_TRY()/PG_CATCH() with an exception-handling system, but it's not. One way to fix this is to put some of the initialization logic in ReplicationSlotCreate() just prior to calling CreateSlotOnDisk(). If the work that needs to be done is too complex or protracted to be done there, then I think that it should be pulled out of the act of creating the replication slot and made to happen as part of first use, or as a separate operation like PrepareForLogicalDecoding. I think what should be done here is adding a drop_on_release flag. As soon as everything important is done, it gets unset. That might be more elegant, but I don't think it really fixes anything, because backing stuff out from on disk can fail. AIUI, your whole concern here is that you don't want the slot creation to fail halfway through and leave behind the slot, but what you've got here doesn't prevent that; it just makes it less likely. The more I think about it,
Re: [HACKERS] Changeset Extraction v7.6.1
On Sun, Feb 16, 2014 at 12:12 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ new patches ] 0001 already needs minor + * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h? Yes, please. If you can submit a separate patch creating this file and relocating this stuff there, I will commit it. I started to work on that, but I am not sure we actually need it anymore. tuptoaster.h isn't included in that many places, so perhaps we should just add it there? That seems fine to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Tue, Feb 18, 2014 at 4:07 AM, Andres Freund and...@2ndquadrant.com wrote: 2. I think the snapshot-export code is fundamentally misdesigned. As I said before, the idea that we're going to export one single snapshot at one particular point in time strikes me as extremely short-sighted. I don't think so. It's precisely what you need to implement a simple replication solution. Yes, there are usecases that could benefit from more possibilities, but that's always the case. For example, consider one-to-many replication where clients may join or depart the replication group at any time. Whenever somebody joins, we just want a snapshot, LSN pair such that they can apply all changes after the LSN except for XIDs that would have been visible to the snapshot. And? They need to create individual replication slots, which each will get a snapshot. So we have to wait for startup N times, and transmit the change stream N times, instead of once? Blech. And in fact, we don't even need any special machinery for that; the client can just make a connection and *take a snapshot* once decoding is initialized enough. No, they can't. Two reasons: For one the commit order between snapshots and WAL isn't necessarily the same. So what? For another, clients now need logic to detect whether a transaction's contents has already been applied or has not been applied yet, that's nontrivial. My point is, I think we should be trying to *make* that trivial, rather than doing this. 3. As this feature is proposed, the only plugin we'll ship with 9.4 is a test_decoding plugin which, as its own documentation says, doesn't do anything especially useful. What exactly do we gain by forcing users who want to make use of these new capabilities to write C code? It gains us to have a output plugin in which we can easily demonstrate features so they can be tested in the regression tests. Which I find to be rather important. Just like e.g. the test_shm_mq stuff doesn't do anything really useful. It definitely doesn't, but this patch is a lot closer to being done than parallel query is, so I'm not sure it's a fair comparison. The test_decoding plugin doesn't seem tremendously much simpler than something that someone could actually use, so why not make that the goal? For one, it being a designated toy plugin allows us to easily change it, to showcase/test new features. For another, I still don't agree that it's easy to agree to an output format. I think we should include some that matured into 9.5. I regret that more effort has not been made in that area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Tue, Feb 18, 2014 at 4:33 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-17 21:35:23 -0500, Robert Haas wrote: What I don't understand is why we're not taking the test_decoding module, polishing it up a little to produce some nice, easily machine-parseable output, calling it basic_decoding, and shipping that. Then people who want something else can build it, but people who are happy with something basic will already have it. Because every project is going to need their own plugin *anyway*. Londiste, slony sure are going to ignore changes to relations they don't need. Querying their own metadata. They will want compatibility to the earlier formats as far as possible. Sometime not too far away they will want to optionally support binary output because it's so much faster. There's just not much chance that either of these will be able to agree on a format short term. Ah, so part of what you're expecting the output plugin to do is filtering. I can certainly see where there might be considerable variation between solutions in that area - but I think that's separate from the question of formatting per se. Although I think we should have an in-core output plugin with filtering capabilities eventually, I'm happy to define that as out of scope for 9.4. But isn't there a way that we can ship something that will due for people who want to just see the database's entire change stream float by? TBH, as compared to what you've got now, I think this mostly boils down to a question of quoting and escaping. I'm not really concerned with whether we ship something that's perfectly efficient, or that has filtering capabilities, or that has a lot of fancy bells and whistles. What I *am* concerned about is that if the user updates a text field that contains characters like or ' or : or [ or ] or , that somebody might be using as delimiters in the output format, that a program can still parse that output format and reliably determine what the actual change was. I don't care all that much whether we use JSON or CSV or something custom, but the data that gets spit out should not have SQL-injection-like vulnerabilities. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 18-02-2014 06:33, Andres Freund wrote: I really hope there will be nicer ones by the time 9.4 is released. Euler did send in a json plugin http://archives.postgresql.org/message-id/52A5BFAE.1040209%2540timbira.com.br , but there hasn't too much feedback yet. It's hard to start discussing something that needs a couple of patches to pg before you can develop your own patch... BTW, I've updated that code to reflect the recent changes in the API and publish it in [1]. This version is based on the Andres' branch xlog-decoding-rebasing-remapping. I'll continue to polish this code. Regards, [1] https://github.com/eulerto/wal2json -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
Hi Robert, On 2014-02-17 20:31:34 -0500, Robert Haas wrote: 1. How safe is it to try to do decoding inside of a regular backend? What we're doing here is entering a special mode where we forbid the use of regular snapshots in favor of requiring the use of decoding snapshots, and forbid access to non-catalog relations. We then run through the decoding process; and then exit back into regular mode. On entering and on exiting this special mode, we InvalidateSystemCaches(). I don't see a big problem with having special backends (e.g. walsender) use this special mode, but I'm less convinced that it's wise to try to set things up so that we can switch back and forth between decoding mode and regular mode in a single backend. The main reason the SQL interface exists is that it's awfully hard to use isolationtester, pg_regress et al when the output isn't also visible via SQL. We tried hacking things in other ways, but that's what it came down to. If you recall, previously the SQL changes interface was only in a test_logical_decoding extension, because I wasn't sure it's all that interesting for real usecases. It's sure nice for testing things though. I worry that won't end up working out very cleanly, and I think the prohibition against using this special mode in an XID-bearing transaction is merely a small downpayment on future pain in this area. That restriction is in principle only needed when creating the slot, not when getting changes. The only problem is that some piece of code doesn't know about it. The reason it exists are twofold: One is that when looking for an initial snapshot, we wait for concurrent transactions to end. If we'd wait for the transaction itself we'd be in trouble, it could never happen. The second reason is that the code do a XactLockTableWait() to visualize it's waiting, so isolatester knows it should background the command. It's not good to wait on itself. But neither is actually needed when not creating the slot, the code just needs to be told about that. That having been said, I can't pretend at this point either to understand the genesis of this particular restriction or what other problems are likely to crop up in trying to allow this mode-switching. So it's possible that I'm overblowing it, but it's makin' me nervous. I am not terribly concerned, but I can understand where you are coming from. I think for replication solutions this isn't going to be needed but it's way much more handy for testing and such. 2. I think the snapshot-export code is fundamentally misdesigned. As I said before, the idea that we're going to export one single snapshot at one particular point in time strikes me as extremely short-sighted. I don't think so. It's precisely what you need to implement a simple replication solution. Yes, there are usecases that could benefit from more possibilities, but that's always the case. For example, consider one-to-many replication where clients may join or depart the replication group at any time. Whenever somebody joins, we just want a snapshot, LSN pair such that they can apply all changes after the LSN except for XIDs that would have been visible to the snapshot. And? They need to create individual replication slots, which each will get a snapshot. And in fact, we don't even need any special machinery for that; the client can just make a connection and *take a snapshot* once decoding is initialized enough. No, they can't. Two reasons: For one the commit order between snapshots and WAL isn't necessarily the same. For another, clients now need logic to detect whether a transaction's contents has already been applied or has not been applied yet, that's nontrivial. This code is going to great pains to be able to export a snapshot at the precise point when all transactions that were running in the first xl_running_xacts record seen after the start of decoding have ended, but there's nothing magical about that point, except that it's the first point at which a freshly-taken snapshot is guaranteed to be good enough to establish an initial state for any table in the database. I still maintain that there's something magic about that moment. It's when all *future* (from the POV of the snapshot) changes will be streamed, and all *past* changes are included in the exported snapshot. But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. Well, that's how pg_dump works, it's not this patch's problem to fix that. You might well want to copy one table at a time, with progressively newer snapshots, and apply to each table only those transactions that weren't part of the initial snapshot for that table. Many other patterns are possible. What you've got baked in here right now is suitable only for the simplest imaginable case, and yet we're paying a
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-17 21:10:26 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: 1. How safe is it to try to do decoding inside of a regular backend? What we're doing here is entering a special mode where we forbid the use of regular snapshots in favor of requiring the use of decoding snapshots, and forbid access to non-catalog relations. We then run through the decoding process; and then exit back into regular mode. On entering and on exiting this special mode, we InvalidateSystemCaches(). How often is such a mode switch expected to happen? I would expect frequent use of InvalidateSystemCaches() to be pretty much disastrous for performance, even absent any of the possible bugs you're worried about. It would likely be better to design things so that a decoder backend does only that. Very infrequently. When it's starting to decode, and when it's ending. When used via walsender, that should only happen at connection start/end which surely shouldn't be frequent. It's more frequent when using the SQL interface, but since that's not a streaming interface on a busy server there still would be a couple of megabytes of transactions to decode for one reset. 3. As this feature is proposed, the only plugin we'll ship with 9.4 is a test_decoding plugin which, as its own documentation says, doesn't do anything especially useful. What exactly do we gain by forcing users who want to make use of these new capabilities to write C code? TBH, if that's all we're going to ship, I'm going to vote against committing this patch to 9.4 at all. Let it wait till 9.5 when we might be able to build something useful on it. There *are* useful things around already. We didn't include postgres_fdw in the same release as the fdw code either? I don't see why this should be held to a different standard. To point out just one obvious problem, how much confidence can we have in the APIs being right if there are no usable clients? Because there *are* clients. They just don't sound likely to either be suitable for core code (to specialized) or have already been submitted (the json plugin). There's a whole replication suite built ontop of this, to a good degree to just test it. So I am fairly confident that the most important parts are covered. There sure is additional features I want, but that's not surprising. The most recent precedent I can think of is the FDW APIs, which I'd be the first to admit are still in flux. But we didn't ship anything there without non-toy contrib modules to exercise it. If we had, we'd certainly have regretted it, because in the creation of those contrib modules we found flaws in the initial design. Which non-toy fdw was there? file_fdw was in 9.1, but that's a toy. And *8.4* had CREATE FOREIGN DATA WRAPPER, without it doing anything... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-17 18:49:59 -0800, Peter Geoghegan wrote: On Mon, Feb 17, 2014 at 6:35 PM, Robert Haas robertmh...@gmail.com wrote: What I actually suspect is going to happen if we ship this as-is is that people are going to start building logical replication solutions on top of the test_decoding module even though it explicitly says that it's just test code. This is *really* cool technology and people are *hungry* for it. But writing C is hard, so if there's not a polished plugin available, I bet people are going to try to use the not-polished one. I think we try to get out ahead of that. Tom made a comparison with FDWs, so I'll make another. The Multicorn module made FDW authorship much more accessible by wrapping it in a Python interface, I believe with some success. I don't want to stand in the way of building a fully-featured test_decoding module, but I think that those that would misuse test_decoding as it currently stands can be redirected to a third-party wrapper. As you say, it's pretty cool stuff, so it seems likely that someone will build one for us. Absolutely. I *sure* hope somebody is going to build such an abstraction. I am not entirely sure how it'd look like, but ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-17 21:35:23 -0500, Robert Haas wrote: What I don't understand is why we're not taking the test_decoding module, polishing it up a little to produce some nice, easily machine-parseable output, calling it basic_decoding, and shipping that. Then people who want something else can build it, but people who are happy with something basic will already have it. Because every project is going to need their own plugin *anyway*. Londiste, slony sure are going to ignore changes to relations they don't need. Querying their own metadata. They will want compatibility to the earlier formats as far as possible. Sometime not too far away they will want to optionally support binary output because it's so much faster. There's just not much chance that either of these will be able to agree on a format short term. So, possibly we could agree to something that consumers *outside* of replication could use. What I actually suspect is going to happen if we ship this as-is is that people are going to start building logical replication solutions on top of the test_decoding module even though it explicitly says that it's just test code. This is *really* cool technology and people are *hungry* for it. But writing C is hard, so if there's not a polished plugin available, I bet people are going to try to use the not-polished one. I think we try to get out ahead of that. I really hope there will be nicer ones by the time 9.4 is released. Euler did send in a json plugin http://archives.postgresql.org/message-id/52A5BFAE.1040209%2540timbira.com.br , but there hasn't too much feedback yet. It's hard to start discussing something that needs a couple of patches to pg before you can develop your own patch... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ patches ] Having now had a little bit of opportunity to reflect on the State Of This Patch, I'd like to step back from the minutia upon which I've been commenting in my previous emails and articulate three high-level concerns about this patch. In so doing, I would like to specifically request that other folks on this mailing list comment on the extent to which they do or do not believe these concerns to be valid. I believe I've mentioned all of these concerns at least to some degree previously, but they've been mixed in with other things, so I want to take this opportunity to call them out more clearly. 1. How safe is it to try to do decoding inside of a regular backend? What we're doing here is entering a special mode where we forbid the use of regular snapshots in favor of requiring the use of decoding snapshots, and forbid access to non-catalog relations. We then run through the decoding process; and then exit back into regular mode. On entering and on exiting this special mode, we InvalidateSystemCaches(). I don't see a big problem with having special backends (e.g. walsender) use this special mode, but I'm less convinced that it's wise to try to set things up so that we can switch back and forth between decoding mode and regular mode in a single backend. I worry that won't end up working out very cleanly, and I think the prohibition against using this special mode in an XID-bearing transaction is merely a small downpayment on future pain in this area. That having been said, I can't pretend at this point either to understand the genesis of this particular restriction or what other problems are likely to crop up in trying to allow this mode-switching. So it's possible that I'm overblowing it, but it's makin' me nervous. 2. I think the snapshot-export code is fundamentally misdesigned. As I said before, the idea that we're going to export one single snapshot at one particular point in time strikes me as extremely short-sighted. For example, consider one-to-many replication where clients may join or depart the replication group at any time. Whenever somebody joins, we just want a snapshot, LSN pair such that they can apply all changes after the LSN except for XIDs that would have been visible to the snapshot. And in fact, we don't even need any special machinery for that; the client can just make a connection and *take a snapshot* once decoding is initialized enough. This code is going to great pains to be able to export a snapshot at the precise point when all transactions that were running in the first xl_running_xacts record seen after the start of decoding have ended, but there's nothing magical about that point, except that it's the first point at which a freshly-taken snapshot is guaranteed to be good enough to establish an initial state for any table in the database. But do you really want to keep that snapshot around long enough to copy the entire database? I bet you don't: if the database is big, holding back xmin for long enough to copy the whole thing isn't likely to be fun. You might well want to copy one table at a time, with progressively newer snapshots, and apply to each table only those transactions that weren't part of the initial snapshot for that table. Many other patterns are possible. What you've got baked in here right now is suitable only for the simplest imaginable case, and yet we're paying a substantial price in implementation complexity for it. Frankly, this code is *ugly*; the fact that SnapBuildExportSnapshot() needs to start a transaction so that it can push out a snapshot. I think that's a pretty awful abuse of the transaction machinery, and the whole point of it, AFAICS, is to eliminate flexibility that we'd have with simpler approaches. 3. As this feature is proposed, the only plugin we'll ship with 9.4 is a test_decoding plugin which, as its own documentation says, doesn't do anything especially useful. What exactly do we gain by forcing users who want to make use of these new capabilities to write C code? You previously stated that it wasn't possible (or there wasn't time) to write something generic, but how hard is it, really? Sure, people who are hard-core should have the option to write C code, and I'm happy that they do. But that shouldn't, IMHO anyway, be a requirement to use that feature, and I'm having trouble understanding why we're making it one. The test_decoding plugin doesn't seem tremendously much simpler than something that someone could actually use, so why not make that the goal? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
Robert Haas robertmh...@gmail.com writes: Having now had a little bit of opportunity to reflect on the State Of This Patch, I'd like to step back from the minutia upon which I've been commenting in my previous emails and articulate three high-level concerns about this patch. In so doing, I would like to specifically request that other folks on this mailing list comment on the extent to which they do or do not believe these concerns to be valid. ... 1. How safe is it to try to do decoding inside of a regular backend? What we're doing here is entering a special mode where we forbid the use of regular snapshots in favor of requiring the use of decoding snapshots, and forbid access to non-catalog relations. We then run through the decoding process; and then exit back into regular mode. On entering and on exiting this special mode, we InvalidateSystemCaches(). How often is such a mode switch expected to happen? I would expect frequent use of InvalidateSystemCaches() to be pretty much disastrous for performance, even absent any of the possible bugs you're worried about. It would likely be better to design things so that a decoder backend does only that. 2. I think the snapshot-export code is fundamentally misdesigned. Your concerns here sound reasonable, but I can't say I've got any special insight into it. 3. As this feature is proposed, the only plugin we'll ship with 9.4 is a test_decoding plugin which, as its own documentation says, doesn't do anything especially useful. What exactly do we gain by forcing users who want to make use of these new capabilities to write C code? TBH, if that's all we're going to ship, I'm going to vote against committing this patch to 9.4 at all. Let it wait till 9.5 when we might be able to build something useful on it. To point out just one obvious problem, how much confidence can we have in the APIs being right if there are no usable clients? Even if they're right, what benefit do we get from freezing them one release before anything useful is going to happen? The most recent precedent I can think of is the FDW APIs, which I'd be the first to admit are still in flux. But we didn't ship anything there without non-toy contrib modules to exercise it. If we had, we'd certainly have regretted it, because in the creation of those contrib modules we found flaws in the initial design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Mon, Feb 17, 2014 at 9:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: 3. As this feature is proposed, the only plugin we'll ship with 9.4 is a test_decoding plugin which, as its own documentation says, doesn't do anything especially useful. What exactly do we gain by forcing users who want to make use of these new capabilities to write C code? TBH, if that's all we're going to ship, I'm going to vote against committing this patch to 9.4 at all. Let it wait till 9.5 when we might be able to build something useful on it. To point out just one obvious problem, how much confidence can we have in the APIs being right if there are no usable clients? Even if they're right, what benefit do we get from freezing them one release before anything useful is going to happen? I actually have a lot of confidence that the APIs are almost entirely right, except maybe for the snapshot-related stuff and possibly one or two other minor details. And I have every confidence that 2ndQuadrant is going to put out decoding modules that do useful stuff. I also assume Slony is going to ship one at some point. EnterpriseDB's xDB replication server will need one, so someone at EDB will have to go write that. And if Bucardo or Londiste want to use this infrastructure, they'll need their own, too. What I don't understand is why it's cool to make each of those replication solutions bring its own to the table. I mean if they want to, so that they can generate exactly the format they want with no extra overhead, sure, cool. What I don't understand is why we're not taking the test_decoding module, polishing it up a little to produce some nice, easily machine-parseable output, calling it basic_decoding, and shipping that. Then people who want something else can build it, but people who are happy with something basic will already have it. What I actually suspect is going to happen if we ship this as-is is that people are going to start building logical replication solutions on top of the test_decoding module even though it explicitly says that it's just test code. This is *really* cool technology and people are *hungry* for it. But writing C is hard, so if there's not a polished plugin available, I bet people are going to try to use the not-polished one. I think we try to get out ahead of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Mon, Feb 17, 2014 at 6:35 PM, Robert Haas robertmh...@gmail.com wrote: What I actually suspect is going to happen if we ship this as-is is that people are going to start building logical replication solutions on top of the test_decoding module even though it explicitly says that it's just test code. This is *really* cool technology and people are *hungry* for it. But writing C is hard, so if there's not a polished plugin available, I bet people are going to try to use the not-polished one. I think we try to get out ahead of that. Tom made a comparison with FDWs, so I'll make another. The Multicorn module made FDW authorship much more accessible by wrapping it in a Python interface, I believe with some success. I don't want to stand in the way of building a fully-featured test_decoding module, but I think that those that would misuse test_decoding as it currently stands can be redirected to a third-party wrapper. As you say, it's pretty cool stuff, so it seems likely that someone will build one for us. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ new patches ] 0001 already needs minor + * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h? Yes, please. If you can submit a separate patch creating this file and relocating this stuff there, I will commit it. I started to work on that, but I am not sure we actually need it anymore. tuptoaster.h isn't included in that many places, so perhaps we should just add it there? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changeset Extraction v7.6.1
On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ new patches ] 0001 already needs minor + * copied stuff from tuptoaster.c. Perhaps there should be toast_internal.h? Yes, please. If you can submit a separate patch creating this file and relocating this stuff there, I will commit it. + /* +* XXX: It's impolite to ignore our argument and keep decoding until the +* current position. +*/ Eh, what? +* We misuse the original meaning of SnapshotData's xip and subxip fields +* to make the more fitting for our needs. [...] +* XXX: Do we want extra fields instead of misusing existing ones instead? If we're going to do this, then it surely needs to be documented in snapshot.h. On the second question, you're not the first hacker to want to abuse the meanings of the existing fields; SnapshotDirty already does it. It's tempting to think we need a more principled approach to this, like what we've done with Node i.e. typedef enum ... SnapshotType; and then a separate struct definition for each kind, all beginning with SnapshotType type. + /* +* XXX: Timeline handling/naming. Do we need to include the timeline in +* snapshot's name? Outside of very obscure, user triggered, cases every +* LSN should correspond to exactly one timeline? +*/ I can't really comment intelligently on this, so you need to figure it out. My off-the-cuff thought is that erring on the side of including it couldn't hurt anything. + * XXX: use hex format for the LSN as well? Yes, please. + /* prepared abort contain a normal commit abort... */ contains. + /* +* Abort all transactions that we keep track of that are older +* than the record's oldestRunningXid. This is the most +* convenient spot for doing so since, in contrast to shutdown +* or end-of-recovery checkpoints, we have sufficient +* knowledge to deal with prepared transactions here. +*/ I have no real quibble with this, but I think the comment could be clarified slightly to state *what* knowledge we have here that we wouldn't have there. + /* only crash recovery/replication needs to care */ I believe I know what you're getting at here, but the next guy might not. I suggest: Although these records only exist to serve the needs of logical decoding, all the work happens as part of crash or archive recovery, so we don't need to do anything here. +* Treat HOT update as normal updates, there is no useful s/, t/. T/ +* There are cases in which inplace updates are used without xids +* assigned (like VACUUM), there are others (like CREATE INDEX +* CONCURRENTLY) where an xid is present. If an xid was assigned In-place updates can be used either by XID-bearing transactions (e.g. in CREATE INDEX CONCURRENTLY) or by XID-less transactions (e.g. VACUUM). In the former case, ... +* redundant because the commit will do that as well, but one +* we'll support decoding in-progress relations, this will be s/one/once/ s/we'll/we/ + /* we don't care about row level locks for now */ + case XLOG_HEAP_LOCK: + break; The position of the comment isn't consistent with the comments for the other WAL record type in this section; that is, it's above rather than below the case. +* transaction's contents as the various caches need to always be I think you should use since or because rather than as here, and maybe put a comma before it. +* the transaction's invalidations. This currently won't be needed if +* we're just skipping over the transaction, since that currently only +* happens when starting decoding, after we invalidated all caches, but +* transactions in other databases might have touched shared relations. I'm having trouble understanding what this means, especially the part after the but. + * Read a HeapTuple as WAL logged by heap_insert, heap_update and + * heap_delete, but not by heap_multi_insert into a tuplebuf. but not by heap_multi_insert needs punctuation both before and after. You can just add a comma after, or change it into a parenthetical phrase. As the above comments probably make clear, I'm pretty much happy with decode.c. + /* TODO: We got to change that someday soon.. */ Two periods. Maybe We need to change this some day soon. - and then follow that with a paragraph explaining what roughly what would need to be done. + /* shorter lines... */ + slot = MyReplicationSlot; + + /* first some sanity checks that are
Re: [HACKERS] Changeset Extraction v7.6.1
On 2014-02-15 17:29:04 -0500, Robert Haas wrote: On Fri, Feb 14, 2014 at 4:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ new patches ] 0001 already needs minor Hm? If there are conflicts, I'll push/send a rebased tomorrow or monday. +* the transaction's invalidations. This currently won't be needed if +* we're just skipping over the transaction, since that currently only +* happens when starting decoding, after we invalidated all caches, but +* transactions in other databases might have touched shared relations. I'm having trouble understanding what this means, especially the part after the but. Let me rephrase, maybe that will help: This currently won't be needed if we're just skipping over the transaction because currenlty we only do so during startup, to get to the first transaction the client needs. As we have reset the catalog caches before starting to read WAL and we haven't yet touched any catalogs there can't be anything to invalidate. But if we're forgetting this commit because it's it happened in another database, the invalidations might be important, because they could be for shared catalogs and we might have loaded data into the relevant syscaches. Better? + if (IsTransactionState() + GetTopTransactionIdIfAny() != InvalidTransactionId) + ereport(ERROR, + (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION), +errmsg(cannot perform changeset extraction in transaction that has performed writes))); This is sort of an awkward restriction, as it makes it hard to compose this feature with others. What underlies the restriction, can we get rid of it, and if not can we include a comment here explaining why it exists? Well, you can write the result into a table if you're halfway careful... :) I think it should be fairly easy to relax the restriction to creating a slot, but not getting data from it. Do you think that would that be sufficient? + /* register output plugin name with slot */ + strncpy(NameStr(slot-data.plugin), plugin, + NAMEDATALEN); + NameStr(slot-data.plugin)[NAMEDATALEN - 1] = '\0'; If it's safe to do this without a lock, I don't know why. Well, the worst that could happen is that somebody else doing a SELECT * FROM pg_replication_slot gets a incomplete plugin name... But we certainly can hold the spinlock during it if you think that's better. More broadly, I wonder why this is_init stuff is in here at all. Maybe the stuff that happens in the is_init case should be done in the caller, or another helper function. The reason I moved it in there is that after the walsender patch there are two callers and the stuff is sufficiently complex that I having it twice lead to bugs. The reason it's currenlty the same function is that sufficiently much of the code would have to be shared that I found it it ugly to split. I'll have a look whether I can figure something out. + /* prevent WAL removal as fast as possible */ + ReplicationSlotsComputeRequiredLSN(); If there's a race here, can't we rejigger the order of operations to eliminate it? Or is that just too hard and not worth it? Yes, there's a small race which at the very least should be properly documented. Hm. Yes, I think we can plug the hole. If the race condition occurs we'd take slightly longer to startup, which isn't bad. Will fix. +begin_txn_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn) + state.callback_name = pg_decode_begin_txn; + ctx-callbacks.begin_cb(ctx, txn); I feel that there's a certain lack of naming consistency between these things. Can we improve that? (and similarly for parallel cases) +pg_create_decoding_replication_slot(PG_FUNCTION_ARGS) I thought we were going to have physical and logical slots, not physical and decoding slots. Ok. + /* make sure we don't end up with an unreleased slot */ + PG_TRY(); + { ... + PG_CATCH(); + { + ReplicationSlotRelease(); + ReplicationSlotDrop(NameStr(*name)); + PG_RE_THROW(); + } + PG_END_TRY(); I don't think this is a very good idea. The problem with doing things during error recovery that can themselves fail is that you'll lose the original error, which is not cool, and maybe even blow out the error stack. Many people have confuse PG_TRY()/PG_CATCH() with an exception-handling system, but it's not. One way to fix this is to put some of the initialization logic in ReplicationSlotCreate() just prior to calling CreateSlotOnDisk(). If the work that needs to be done is too complex or protracted to be done there, then I think that it should be pulled out of the act of creating the replication slot and made to happen as part of first use, or as a separate operation