Re: [HACKERS] logical changeset generation v6.8
On 2013-12-16 00:53:10 -0500, Robert Haas wrote: Yes, I think we could mostly reuse it, we'd probably want to add a field or two more (application_name, sync_prio?). I have been wondering whether some of the code in replication/logical/logical.c shouldn't be in replication/slot.c or similar. So far I've opted for leaving it in its current place since it would have to change a bit for a more general role. I strongly favor moving the slot-related code to someplace with slot in the name, and replication/slot.c seems about right. Even if we don't extend them to cover non-logical replication in this release, we'll probably do it eventually, and it'd be better if that didn't require moving large amounts of code between files. Any opinion on the storage location of the slot files? It's currently pg_llog/$slotname/state[.tmp]. It's a directory so we have a location during logical decoding to spill data to... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Tue, Dec 17, 2013 at 7:48 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-16 00:53:10 -0500, Robert Haas wrote: Yes, I think we could mostly reuse it, we'd probably want to add a field or two more (application_name, sync_prio?). I have been wondering whether some of the code in replication/logical/logical.c shouldn't be in replication/slot.c or similar. So far I've opted for leaving it in its current place since it would have to change a bit for a more general role. I strongly favor moving the slot-related code to someplace with slot in the name, and replication/slot.c seems about right. Even if we don't extend them to cover non-logical replication in this release, we'll probably do it eventually, and it'd be better if that didn't require moving large amounts of code between files. Any opinion on the storage location of the slot files? It's currently pg_llog/$slotname/state[.tmp]. It's a directory so we have a location during logical decoding to spill data to... pg_replslot? pg_replication_slot? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
Hi Robert, On 2013-12-16 00:53:10 -0500, Robert Haas wrote: On Wed, Dec 11, 2013 at 7:08 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think there's much point in including remapping in all of the error messages. It adds burden for translators and users won't know what a remapping file is anyway. It helps in locating wich part of the code caused a problem. I utterly hate to get reports with error messages that I can't correlate to a sourcefile. Yes, I know that verbose error output exists, but usually you don't get it that way... That said, I'll try to make the messages simpler. Well, we could adopt a policy of not making messages originating from different locations in the code the same. However, it looks to me like that's NOT the current policy, because some care has been taken to reuse messages rather than distinguish them. To me that mostly looks like cases where we either don't want to tell more for various security-ish purposes or where they have been copy and pasted... There's no hard and fast rule here, because some cases are distinguished, but my gut feeling is that all of the errors your patch introduces are sufficiently obscure cases that separate messages with separate translations are not warranted. Perhaps we should just introduce a marker that some such strings are not to be translated if they are of the unexpected kind. That would probably make debugging easier too ;) I need to examine that logic in more detail, but I had trouble following it and was hoping the next version would be better-commented. Yea, I've started expanding the comments about the higher level concerns - I've been so knee deep in this that I didn't realize they weren't there. b) make hot_standby_feedback work across disconnections of the walsender connection (i.e peg xmin, not just for catalogs though) Check; might need to be optional. Yea, I am pretty sure it will. It'd probably pretty nasty to set min_recovery_apply_delay=7d and force xmin kept to that... c) Make sure we can transport those across cascading replication. Not sure I follow. Consider a replication scenario like primary - standby-1 - standby-2. The primary may not only not remove data that standby-1 requires, but also not data that standby-2 needs. Not really necessary for WAL since that will also reside on standby-1 but definitely for the xmin horizon. So standby-1 will need to signal not only his own needs, but also of the nodes below. The hard questions that I see are like: * How do we manage standby registration? Does the admin have to do that manually? Or does a standby register itself automatically if some config paramter is set? * If automatically, how do we deal with the situation that registrant dies before noting his own identifier somewhere persistent? My best idea is a two phase registration process where registration in phase 1 are thrown away after a restart, but yuck. If you don't know the answers to these questions for the kind of replication that we have now, then how do you know the answers for logical replication? Conversely, what makes the answers that you've selected for logical replication unsuitable for our existing replication? There's a pretty fundamental difference imo - with the logical decoding stuff we only supply support for change producing nodes, with physical rep we supply both. There's no need to decide about the way node ids are stored in in-core logical rep. consumers since there are no in-core ones. Yet. Also, physical rep by now is a pretty established thing, we need to be much more careful about compatibility there. I have to admit that before I saw your design for the logical replication slots, the problem of making this work for our existing replication stuff seemed almost intractable to me; I had no idea how that was going to work. Believe me, it caused me some headaches to deceive it for decoding too. Oh, and I think I watched just about all episodes of some stupid TV show during it ;) Keeping the data in shared memory, persisting them across shutdowns, and managing them via either function calls or the replication command language seems perfect. Thanks. I think the concept has quite some merits. The implementation is a bit simplistic atm, we e.g. might want to work harder at coalescing fsync()s and such, but that's a further step when we see whether it's worthwile in the real world. Now, in terms of how registration works, whether for physical replication or logical, it seems to me that the DBA will have to be responsible for telling each client the name of the slot to which that client should connect (omitting it at said DBA's option if, as for physical replication, the slot is not mandatory). It seems reasonable to me to reuse the application_name for the slot's name, similar to the way it's used for synchronous rep. It seems odd to use two different ways to identify nodes. t should
Re: [HACKERS] logical changeset generation v6.8
On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com wrote: There's no hard and fast rule here, because some cases are distinguished, but my gut feeling is that all of the errors your patch introduces are sufficiently obscure cases that separate messages with separate translations are not warranted. Perhaps we should just introduce a marker that some such strings are not to be translated if they are of the unexpected kind. That would probably make debugging easier too ;) Well, we have that: it's called elog. But that doesn't seem like the right thing here. b) make hot_standby_feedback work across disconnections of the walsender connection (i.e peg xmin, not just for catalogs though) Check; might need to be optional. Yea, I am pretty sure it will. It'd probably pretty nasty to set min_recovery_apply_delay=7d and force xmin kept to that... Yes, that would be... unfortunate. c) Make sure we can transport those across cascading replication. Not sure I follow. Consider a replication scenario like primary - standby-1 - standby-2. The primary may not only not remove data that standby-1 requires, but also not data that standby-2 needs. Not really necessary for WAL since that will also reside on standby-1 but definitely for the xmin horizon. So standby-1 will need to signal not only his own needs, but also of the nodes below. True. The hard questions that I see are like: * How do we manage standby registration? Does the admin have to do that manually? Or does a standby register itself automatically if some config paramter is set? * If automatically, how do we deal with the situation that registrant dies before noting his own identifier somewhere persistent? My best idea is a two phase registration process where registration in phase 1 are thrown away after a restart, but yuck. If you don't know the answers to these questions for the kind of replication that we have now, then how do you know the answers for logical replication? Conversely, what makes the answers that you've selected for logical replication unsuitable for our existing replication? There's a pretty fundamental difference imo - with the logical decoding stuff we only supply support for change producing nodes, with physical rep we supply both. I'm not sure I follow this. Both what and what? There's no need to decide about the way node ids are stored in in-core logical rep. consumers since there are no in-core ones. Yet. I don't know that we have or need to make any judgements about how to store node IDs. You have decided that slots have names, and I see no problem there. Also, physical rep by now is a pretty established thing, we need to be much more careful about compatibility there. I don't think we should change anything in backward-incompatible fashion. If we add any new behavior, it'd surely be optional. I think we need to improve the monitoring facilities a bit, and that should be it. Like * expose xmin in pg_stat_activity, pg_prepared_xacts, pg_replication_slots (or whatever it's going to be called) * expose the last restartpoint's redo pointer in pg_stat_replication, pg_replication_slots +1. That said, the consequences can be a bit harsher than a full disk - the anti-wraparound security measures might kick in requiring a restart into single user mode. That's way more confusing than cleaning up a bit of space on the disk. Yes, true. I'm not sure what approach to that problem is best. It's long seemed to me that well before we get to the point of shutting down the whole cluster we ought to just start killing sessions with old xmins. But that doesn't generalize well to prepared transactions, which can't just be rolled back or committed without guidance; and killing slots seems a bit dicey too. Consider what happens though, if you promote a node for physical rep. As soon as it's promoted, it will accept writes and then start a checkpoint. Unless other standbys have started to follow that node before either that checkpoint happens (removing WAL) or autovacuuming/hot-pruning is performed (creating recovery conflicts), we'll possibly loose the data required to let the standbys follow the promotion. Note that wal_keep_segments and vacuum_defer_cleanup_age both sorta work for that... True. Could somebody please deliver me a time dilation device? Upon reflection, I am less concerned with actually having physical slots in this release than I am with making sure we're not boxing ourselves into a corner that will make them hard to add later. If we've got a clear design that can be generalized to that case, but the SMOP required exceeds what can be done in the time available, I am OK to punt it. But I am not sure we're at that point yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] logical changeset generation v6.8
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com wrote: Perhaps we should just introduce a marker that some such strings are not to be translated if they are of the unexpected kind. That would probably make debugging easier too ;) Well, we have that: it's called elog. But that doesn't seem like the right thing here. errmsg_internal? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Mon, Dec 16, 2013 at 12:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Dec 16, 2013 at 6:01 AM, Andres Freund and...@2ndquadrant.com wrote: Perhaps we should just introduce a marker that some such strings are not to be translated if they are of the unexpected kind. That would probably make debugging easier too ;) Well, we have that: it's called elog. But that doesn't seem like the right thing here. errmsg_internal? There's that, too. But again, these messages are not can't-happen scenarios. The argument is just whether to reuse existing error message text (like could not write file) or invent a new variation (like could not write remapping file). Andres' argument (which is valid) is that distinguished messages make it easier to troubleshoot without needing to turn on verbose error messages. My argument (which I think is also valid) is that a user isn't likely to know what a remapping file is, and having more messages increases the translation burden. Is there a project policy on this topic? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
Robert Haas escribió: There's that, too. But again, these messages are not can't-happen scenarios. The argument is just whether to reuse existing error message text (like could not write file) or invent a new variation (like could not write remapping file). Andres' argument (which is valid) is that distinguished messages make it easier to troubleshoot without needing to turn on verbose error messages. My argument (which I think is also valid) is that a user isn't likely to know what a remapping file is, and having more messages increases the translation burden. Is there a project policy on this topic? I would vote for a generic could not write file %s where the %s lets the troubleshooter know the path of the file, and thus in what context it is being read. We already have a similar case where slru.c reports error as pertaining to transaction 12345 but the path is pg_subtrans/xyz or multixact etc; while it doesn't explicitely say what module is raising the error, it's pretty clear from the path. Would that not work here? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
Robert Haas robertmh...@gmail.com writes: There's that, too. But again, these messages are not can't-happen scenarios. The argument is just whether to reuse existing error message text (like could not write file) or invent a new variation (like could not write remapping file). As long as the message includes the file name, which it surely oughta, I don't see that we need any explanation of what Postgres thinks the file is for. If someone cares about that they can reverse-engineer it from the file name; while as you said upthread, most of the time the directory path is going to be the key piece of useful information. So +1 for could not write file. Andres' argument (which is valid) is that distinguished messages make it easier to troubleshoot without needing to turn on verbose error messages. My argument (which I think is also valid) is that a user isn't likely to know what a remapping file is, and having more messages increases the translation burden. Is there a project policy on this topic? I think Andres' argument is a thinly veiled version of let's put the routine name into the message text, which there definitely is project policy against (see 49.3.13 in the message style guide). If you want to know the code location where the error was thrown, the answer is to get a verbose log, not to put identifying information into the user-facing message text. And this is only partially-identifying information, which seems like the worst of both worlds: you've got confused users and overworked translators, and you still don't know exactly where it was thrown from. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 11, 2013 at 7:08 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think there's much point in including remapping in all of the error messages. It adds burden for translators and users won't know what a remapping file is anyway. It helps in locating wich part of the code caused a problem. I utterly hate to get reports with error messages that I can't correlate to a sourcefile. Yes, I know that verbose error output exists, but usually you don't get it that way... That said, I'll try to make the messages simpler. Well, we could adopt a policy of not making messages originating from different locations in the code the same. However, it looks to me like that's NOT the current policy, because some care has been taken to reuse messages rather than distinguish them. There's no hard and fast rule here, because some cases are distinguished, but my gut feeling is that all of the errors your patch introduces are sufficiently obscure cases that separate messages with separate translations are not warranted. The time when this is likely to fail is when someone borks the permissions on the data directory, and the user probably won't need to care exactly which file we can't write. + if (!TransactionIdIsNormal(xmax)) + { + /* +* no xmax is set, can't have any permanent ones, so this check is +* sufficient +*/ + } + else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple-t_data-t_infomask)) + { + /* only locked, we don't care */ + } + else if (!TransactionIdPrecedes(xmax, cutoff)) + { + /* tuple has been deleted recently, log */ + do_log_xmax = true; + } Obfuscated. Rewrite without empty blocks. I don't understand why having an empty block is less clear than including a condition in several branches? Especially if the individual conditions might need to be documented? It's not a coding style we typically use, to my knowledge. Of course, what is actually clearest is a matter of opinion, but my own experience is that such blocks typically end up seeming clearer to me when the individual comments are joined together into a paragraph that explains the logic in full sentences what's going on. + /* +* Write out buffer everytime we've too many in-memory entries. +*/ + if (state-rs_num_rewrite_mappings = 1000 /* arbitrary number */) + logical_heap_rewrite_flush_mappings(state); What happens if the number of items per hash table entry is small but the number of entries is large? rs_num_rewrite_mappings is the overall number of in-memory mappings, not the number of per-entry mappings. That means we flush, even if all entries have only a couple of mappings, as soon as 1000 in memory entries have been collected. A bit simplistic, but seems okay enough? Possibly, not sure yet. I need to examine that logic in more detail, but I had trouble following it and was hoping the next version would be better-commented. I'm still unhappy that we're introducing logical decoding slots but no analogue for physical replication. If we had the latter, would it share meaningful amounts of structure with this? Yes, I think we could mostly reuse it, we'd probably want to add a field or two more (application_name, sync_prio?). I have been wondering whether some of the code in replication/logical/logical.c shouldn't be in replication/slot.c or similar. So far I've opted for leaving it in its current place since it would have to change a bit for a more general role. I strongly favor moving the slot-related code to someplace with slot in the name, and replication/slot.c seems about right. Even if we don't extend them to cover non-logical replication in this release, we'll probably do it eventually, and it'd be better if that didn't require moving large amounts of code between files. I still think that the technical parts of properly supporting persistent slots for physical rep aren't that hard, but that the behavioural decisions are. I think there are primarily two things for SR that we want to prevent using slots: a) removal of still used WAL (i.e. maintain knowledge about a standby's last required REDO location) Check. b) make hot_standby_feedback work across disconnections of the walsender connection (i.e peg xmin, not just for catalogs though) Check; might need to be optional. c) Make sure we can transport those across cascading replication. Not sure I follow. once those are there it's also useful to keep a bit more information about the state of replicas: * write, flush, apply lsn Check. The hard questions that I see are like: * How do we manage standby registration? Does the admin have to do that manually? Or does a standby register itself automatically if some config paramter is set? * If automatically, how do we deal with the situation
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 11, 2013 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote: Committed #1 (again). Regarding this: This introduced a new compiler warning on the visual studios build: d:\postgres\b\src\backend\utils\cache\relcache.c(3958): warning C4715: 'RelationGetIndexAttrBitmap' : not all control paths return a value [D:\Postgres\b\postgres.vcxproj] The attached patch fixes it. Regards David Rowley + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers relcache_fix_warning.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On 2013-12-13 20:58:24 +1300, David Rowley wrote: On Wed, Dec 11, 2013 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote: This introduced a new compiler warning on the visual studios build: d:\postgres\b\src\backend\utils\cache\relcache.c(3958): warning C4715: 'RelationGetIndexAttrBitmap' : not all control paths return a value [D:\Postgres\b\postgres.vcxproj] The attached patch fixes it. I thought we'd managed to get elog(ERROR) properly annotated as noreturn on msvc as well? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Sat, Dec 14, 2013 at 12:12 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-12-13 20:58:24 +1300, David Rowley wrote: On Wed, Dec 11, 2013 at 1:11 PM, Robert Haas robertmh...@gmail.com wrote: This introduced a new compiler warning on the visual studios build: d:\postgres\b\src\backend\utils\cache\relcache.c(3958): warning C4715: 'RelationGetIndexAttrBitmap' : not all control paths return a value [D:\Postgres\b\postgres.vcxproj] The attached patch fixes it. I thought we'd managed to get elog(ERROR) properly annotated as noreturn on msvc as well? It looks like this is down to the elog macro, where the elevel is being assigned to a variable elevel_ then we're only doing pg_unreachable(); if elevel_ = ERROR. The compiler must not be confident enough to optimise out the if condition even though the elevel is not changed after it is set from the constant. Regards David Rowley Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] logical changeset generation v6.8
On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Thanks! Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Ok. I wasn't sure if it wouldn't be wierd to include the oid in the tuple logged for a replica identity that doesn't cover the oid. But the downside is pretty small... Will send a patch. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Patch attached. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 47c93d8e7afbcaef268de66246571cdc2f134c97 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 11 Dec 2013 17:20:27 +0100 Subject: [PATCH] Dep't of second thoughts: Always include oids in WAL logged replica identities. Since replica identities are logged using the normal format for heap tuples, the space for oids in WITH OIDS tables was already used, so there's little point in only including the oid if it is included in the configured IDENTITY. Per comment from Robert Haas. --- src/backend/access/heap/heapam.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 249fffe..e647453 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6638,7 +6638,6 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * TupleDesc idx_desc; char replident = relation-rd_rel-relreplident; HeapTuple key_tuple = NULL; - bool copy_oid = false; bool nulls[MaxHeapAttributeNumber]; Datum values[MaxHeapAttributeNumber]; int natt; @@ -6698,7 +6697,8 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * int attno = idx_rel-rd_index-indkey.values[natt]; if (attno == ObjectIdAttributeNumber) - copy_oid = true; + /* copied below */ + ; else if (attno 0) elog(ERROR, system column in index); else @@ -6709,8 +6709,12 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool * *copy = true; RelationClose(idx_rel); - /* XXX: we could also do this unconditionally, the space is used anyway */ - if (copy_oid) + /* + * Always copy oids if the table has them, even if not included in the + * index. The space in the logged tuple is used anyway, so there's little + * point in not including the information. + */ + if (relation-rd_rel-relhasoids) HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); /* -- 1.8.5.rc2.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 11, 2013 at 11:25 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-10 19:11:03 -0500, Robert Haas wrote: Committed #1 (again). Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. Patch attached. Committed with kibitzing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
Hi, Lots of sensible comments removed, I plan to make changes to address them. On 2013-12-10 22:17:44 -0500, Robert Haas wrote: - I think this needs SGML documentation, same kind of thing we have for background workers, except probably significantly more. A design document with ASCII art in a different patch does not constitute usable documentation. I think it would fit best under section VII, internals, perhaps entitled Writing a Logical Decoding Plugin. Looking at it, I rather wonder if the Background Worker Processes ought to be there as well, rather than under section V, server programming. Completely agreed it needs that. I'd like the UI decisions in http://www.postgresql.org/message-id/20131205001520.ga8...@awork2.anarazel.de resolved before though. + logical_xmin = + ((volatile LogicalDecodingCtlData*) LogicalDecodingCtl)-xmin; Ugly. We really should have a macro for this... Message style. I suggest treating a short write as ENOSPC; there is precedent elsewhere. I don't think there's much point in including remapping in all of the error messages. It adds burden for translators and users won't know what a remapping file is anyway. It helps in locating wich part of the code caused a problem. I utterly hate to get reports with error messages that I can't correlate to a sourcefile. Yes, I know that verbose error output exists, but usually you don't get it that way... That said, I'll try to make the messages simpler. + /* use *GetUpdateXid to correctly deal with multixacts */ + xmax = HeapTupleHeaderGetUpdateXid(new_tuple-t_data); I don't feel enlightened by that comment. Well, it will return the update xid of a tuple locked with NO KEY SHARE and updated (with NO KEY UPDATE). I.e. 9.3+ foreign key locking stuff. + if (!TransactionIdIsNormal(xmax)) + { + /* +* no xmax is set, can't have any permanent ones, so this check is +* sufficient +*/ + } + else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple-t_data-t_infomask)) + { + /* only locked, we don't care */ + } + else if (!TransactionIdPrecedes(xmax, cutoff)) + { + /* tuple has been deleted recently, log */ + do_log_xmax = true; + } Obfuscated. Rewrite without empty blocks. I don't understand why having an empty block is less clear than including a condition in several branches? Especially if the individual conditions might need to be documented? + /* +* Write out buffer everytime we've too many in-memory entries. +*/ + if (state-rs_num_rewrite_mappings = 1000 /* arbitrary number */) + logical_heap_rewrite_flush_mappings(state); What happens if the number of items per hash table entry is small but the number of entries is large? rs_num_rewrite_mappings is the overall number of in-memory mappings, not the number of per-entry mappings. That means we flush, even if all entries have only a couple of mappings, as soon as 1000 in memory entries have been collected. A bit simplistic, but seems okay enough? + /* XXX: should we warn about such files? */ Nah. Ok, will remove that comment from a couple locations then... + /* XXX: we could replay the transaction and prepare it as well. */ Should we do that? It would allow a neat feature, namely using 2PC to make sure that a transaction commit on all the nodes connected using changeset extraction. But I think that's a feature for later. Its use would have to be optional anyway. All right, I'm giving up for now. These patches need a LOT of work on comments and documentation to be committable, even if the underlying architecture is basically sound. +* clog. If we're doing logical replication we can't do that though, so +* hold the lock for a moment longer. ...because why? That's something pretty icky imo. But more in the existing HS code than in this. Without the changes in the locking we can have the situation that transactions are marked as running in the xl_running_xact record, but are actually already committed. There's some code for HS that tries to cope with that situation but I don't trust it very much, and it'd be more complicated to make it work for logical decoding. I could reproduce problems for the latter without those changes. I'll add a comment explaining this. I'm still unhappy that we're introducing logical decoding slots but no analogue for physical replication. If we had the latter, would it share meaningful amounts of structure with this? Yes, I think we could mostly reuse it, we'd probably want to add a field or two more (application_name, sync_prio?). I have been wondering whether some of the code in replication/logical/logical.c shouldn't be in replication/slot.c or similar. So far I've opted for leaving it in its current
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-03 15:19:26 -0500, Robert Haas wrote: Yeah, you're right. I think the current logic will terminate when all flags are set to false or all attribute numbers have been checked, but it doesn't know that if HOT's been disproven then we needn't consider further HOT columns. I think the way to fix that is to tweak this part: + if (next_hot_attnum FirstLowInvalidHeapAttributeNumber) check_now = next_hot_attnum; + else if (next_key_attnum FirstLowInvalidHeapAttributeNumber) + check_now = next_key_attnum; + else if (next_id_attnum FirstLowInvalidHeapAttributeNumber) + check_now = next_id_attnum; else + break; What I think we ought to do there is change each of those criteria to say if (hot_result next_hot_attnum FirstLowInvalidHeapAttributeNumber) and similarly for the other two. That way we consider each set a valid source of attribute numbers only until the result flag for that set flips false. That seems to work well, yes. Updated rebased series attached. * Rebased since the former patch 01 has been applied * Lots of smaller changes in the wal_level=logical patch * Use Robert's version of wal_level=logical, with the above fixes * Use only macros for RelationIsAccessibleInLogicalDecoding/LogicallyLogged * Moved a mit more logic into ExtractReplicaIdentity * some comment copy-editing * Bug noted by Euler fixed, testcase added * Some copy editing in later patches, nothing significant. I've primarily sent this, because I don't know of further required changes in 0001-0003. I am trying reviewing the other patches in detail atm. Committed #1 (again). Regarding this: + /* XXX: we could also do this unconditionally, the space is used anyway + if (copy_oid) + HeapTupleSetOid(key_tuple, HeapTupleGetOid(tp)); I would like to put in a big +1 for doing that unconditionally. I didn't make that change before committing, but I think it'd be a very good idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com wrote: I've primarily sent this, because I don't know of further required changes in 0001-0003. I am trying reviewing the other patches in detail atm. Committed #3 also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.8
On Wed, Dec 4, 2013 at 10:55 AM, Andres Freund and...@2ndquadrant.com wrote: [ updated logical decoding patches ] Regarding patch #4, introduce wal decoding via catalog timetravel, which seems to be the bulk of what's not committed at this point... - I think this needs SGML documentation, same kind of thing we have for background workers, except probably significantly more. A design document with ASCII art in a different patch does not constitute usable documentation. I think it would fit best under section VII, internals, perhaps entitled Writing a Logical Decoding Plugin. Looking at it, I rather wonder if the Background Worker Processes ought to be there as well, rather than under section V, server programming. + /* setup the redirected t_self for the benefit of logical decoding */ ... + /* reset to original, non-redirected, tid */ Clear as mud. + * rrow to disk but only do so in batches when we've collected several of them Typo. + * position before those records back. Independently from WAL logging, before those records back? + * position before those records back. Independently from WAL logging, + * everytime a rewrite is finished all generated mapping files are directly I would delete Independently from WAL logging from this sentence. And everytime is two words. + * file. That leaves the tail end that has not yet been fsync()ed to disk open ... + * fsynced. Pick a spelling and stick with it. My suggestion is flushed to disk, not actually using fsync per se at all. + * TransactionDidCommit() check. But we want to support physical replication + * for availability and to support logical decoding on the standbys. What is physical replication for availability? + * necessary. If we detect that we don't need to log anything we'll prevent + * any further action by logical_*rewrite* Sentences should end with a period, and the reason for the asterisks is not clear. + logical_xmin = + ((volatile LogicalDecodingCtlData*) LogicalDecodingCtl)-xmin; Ugly. +errmsg(failed to write to logical remapping file: %m))); Message style. + ereport(ERROR, + (errcode_for_file_access(), +errmsg(incomplete write to logical remapping file, wrote %d of %u, + written, len))); Message style. I suggest treating a short write as ENOSPC; there is precedent elsewhere. I don't think there's much point in including remapping in all of the error messages. It adds burden for translators and users won't know what a remapping file is anyway. + /* +* We intentionally violate the usual WAL coding practices here and +* write to the file *first*. This way an eventual checkpoint will +* sync any part of the file that's not guaranteed to be recovered by +* the XLogInsert(). We deal with the potential corruption in the tail +* of the file by truncating it to the last safe point during WAL +* replay and by checking whether the xid performing the mapping has +* committed. +*/ Don't have two different comments explaining this. Either replace this one with a reference to the other one, or get rid of the other one and just keep this one. I vote for the latter. I don't see a clear explanation anywhere of what the rs_logical_mappings hash is actually doing. This is badly needed. This code basically presupposes that you know what it's try to accomplish, and even though I sort of do, it leaves a lot to be desired in terms of clarity. + /* nothing to do if we're not working on a catalog table */ + if (!state-rs_logical_rewrite) + return; Comment doesn't accurately describe code. + /* use *GetUpdateXid to correctly deal with multixacts */ + xmax = HeapTupleHeaderGetUpdateXid(new_tuple-t_data); I don't feel enlightened by that comment. + if (!TransactionIdIsNormal(xmax)) + { + /* +* no xmax is set, can't have any permanent ones, so this check is +* sufficient +*/ + } + else if (HEAP_XMAX_IS_LOCKED_ONLY(new_tuple-t_data-t_infomask)) + { + /* only locked, we don't care */ + } + else if (!TransactionIdPrecedes(xmax, cutoff)) + { + /* tuple has been deleted recently, log */ + do_log_xmax = true; + } Obfuscated. Rewrite without empty blocks. + /* +* Write out buffer everytime we've too many in-memory entries. +*/ + if (state-rs_num_rewrite_mappings = 1000 /* arbitrary number */) + logical_heap_rewrite_flush_mappings(state); What happens if the number of items per hash table entry is small but the