Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. This idea had support from at least 6 hackers. I'm happy to add my own. Can I suggest it is added as a hook, rather than argue about the details too much? The main use case is in combination with external systems, so that way we can maintain the relevant code with the system that cares about it. CommitTransaction() inside of xact.c will call a function, that inserts a new record into this array. The operation will for most of the time be nothing than taking a spinlock and adding the record to shared memory. All the data for the record is readily available, does not require further locking and can be collected locally before taking the spinlock. The begin_timestamp is the transactions idea of CURRENT_TIMESTAMP, the commit_timestamp is what CommitTransaction() just decided to write into the WAL commit record and the total_rowcount is the sum of inserted, updated and deleted heap tuples during the transaction, which should be easily available from the statistics collector, unless row stats are disabled, in which case the datum would be zero. Does this need to be called while in a critical section? Or can we wait until after the actual marking of the commit before calling this? Checkpoint handling will call a function to flush the shared buffers. Together with this, the information from WAL records will be sufficient to recover this data (except for row counts) during crash recovery. So it would need to work identically in recovery also? These two values are not currently stored in the commit WAL record. timestamptz xci_begin_timestamp int64 xci_total_rowcount Both of those seem optional, so I don't really want them added to WAL. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Excerpts from Marko Kreen's message of jue jun 10 18:10:50 -0400 2010: Jan's proposal of storing small struct into segmented files sounds like it could work. Can't say anything more because I can't imagine it as well as Jan. Would need to play with working implementation to say more... We already have such a thing -- see pg_multixact -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/4/10, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark gsst...@mit.edu wrote: A function which takes a starting xid and a number of transactions to return seems very tied to one particular application. I could easily see other systems such as a multi-master system instead only wanting to compare two transactions to find out which committed first. Or non-replication applications where you have an LSN and want to know whether a given transaction had committed by that time. So one possible interface would be to do something like xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with an optional argument to limit the number or records returned. I'm imagining that the backend data storage for this would be a file containing, essentially, a struct for each commit repeated over and over again, packed tightly. It's easy to index into such a file using a sequence number (give me the 1000'th commit) but searching by LSN would require (a) storing the LSNs and (b) binary search. Maybe it's worth adding that complexity, but I'm not sure that it is. Keeping the size of this file small is important for ensuring that it has minimal performance impact (which is also why I'm not sold on trying to include the tuple counters that Jan proposed - I think we can solve the problem he's worried about there more cleanly in other ways). AIUI, you index the file by offset. I think we should be very careful about assuming that we understand replication and its needs better than someone who has spent many years developing one of the major PostgreSQL replication solutions. Well the flip side of that is that we want an interface that's useful for more than just one replication system. This is something basic enough that I think it will be useful for more than just replication if we design it generally enough. It should be useful for backup/restore processes and monitoring as well as various forms of replication including master-slave trigger based systems but also including PITR-based replication, log-parsing systems, multi-master trigger based systems, 2PC-based systems, etc. Making it general enough to serve multiple needs is good, but we've got to make sure that the extra complexity is buying us something. Jan seems pretty confident that this could be used by Londiste also, though it would be nice to have some confirmation from the Londiste developer(s) on that. I think it may also have applications for distributed transactions and multi-master replication, but I am not too sure it helps much for PITR-based replication or log-parsing systems. We want to design something that is good, but trying to solve too many problems may end up solving none of them well. The potential for single shared queue implementation, with the additional potential for merging async replication implementations sounds attractive. (Merging ~ having single one that satisfies broad range of needs.) Unless the functionality accepted into core will be limited to replication only and/or performs worse than current snapshot-based grouping. Then it is uninteresting, of course. Jan's proposal of storing small struct into segmented files sounds like it could work. Can't say anything more because I can't imagine it as well as Jan. Would need to play with working implementation to say more... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote: I find the skeptical attitude on this thread altogether unwarranted. Jan made his case and, at least IMHO, presented it pretty clearly. Just to be clear I think the idea of exposing commit order is a no-brainer. The specific interface is what I was questioning. A function which takes a starting xid and a number of transactions to return seems very tied to one particular application. I could easily see other systems such as a multi-master system instead only wanting to compare two transactions to find out which committed first. Or non-replication applications where you have an LSN and want to know whether a given transaction had committed by that time. So one possible interface would be to do something like xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with an optional argument to limit the number or records returned. So you could do: old := select pg_current_xlog_location(); while (1) { sleep 60s; new := select pg_current_xlog_location() process(select xids_committed_between(old,new)) old := new } This might be more useful for PITR recovery for example where you want to find out what transactions committed between now and some known point of corruption. I could also see it being useful to have a function pg_xlog_location_of_commit(xid). That would let you run recovery until a particular transaction committed or test whether your replica is caught up to a particular commit. It could be useful for monitoring Hot Standby slaves. He then answered, multiple times, numerous questions which were already addressed in the original email, as well as various others. I think I did miss some of the original description. That might have caused some of the difficulty as I was asking questions about something he assumed he had already answered. I think we should be very careful about assuming that we understand replication and its needs better than someone who has spent many years developing one of the major PostgreSQL replication solutions. Well the flip side of that is that we want an interface that's useful for more than just one replication system. This is something basic enough that I think it will be useful for more than just replication if we design it generally enough. It should be useful for backup/restore processes and monitoring as well as various forms of replication including master-slave trigger based systems but also including PITR-based replication, log-parsing systems, multi-master trigger based systems, 2PC-based systems, etc. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark gsst...@mit.edu wrote: On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote: I find the skeptical attitude on this thread altogether unwarranted. Jan made his case and, at least IMHO, presented it pretty clearly. Just to be clear I think the idea of exposing commit order is a no-brainer. The specific interface is what I was questioning. OK, thanks for that clarification. A function which takes a starting xid and a number of transactions to return seems very tied to one particular application. I could easily see other systems such as a multi-master system instead only wanting to compare two transactions to find out which committed first. Or non-replication applications where you have an LSN and want to know whether a given transaction had committed by that time. So one possible interface would be to do something like xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with an optional argument to limit the number or records returned. I'm imagining that the backend data storage for this would be a file containing, essentially, a struct for each commit repeated over and over again, packed tightly. It's easy to index into such a file using a sequence number (give me the 1000'th commit) but searching by LSN would require (a) storing the LSNs and (b) binary search. Maybe it's worth adding that complexity, but I'm not sure that it is. Keeping the size of this file small is important for ensuring that it has minimal performance impact (which is also why I'm not sold on trying to include the tuple counters that Jan proposed - I think we can solve the problem he's worried about there more cleanly in other ways). So you could do: old := select pg_current_xlog_location(); while (1) { sleep 60s; new := select pg_current_xlog_location() process(select xids_committed_between(old,new)) old := new } This might be more useful for PITR recovery for example where you want to find out what transactions committed between now and some known point of corruption. This could also be done by selecting the current commit sequence number, getting the XIDs committed between the two commit sequence numbers, etc. I could also see it being useful to have a function pg_xlog_location_of_commit(xid). That would let you run recovery until a particular transaction committed or test whether your replica is caught up to a particular commit. It could be useful for monitoring Hot Standby slaves. Well, you'd need to index the commit data to make that work, I think, so that adds a lot of complexity. The implementation as proposed lets you find the commits after a known point in order of occurrence, but it doesn't let you inquire about the location of a particular commit. If you want to run recovery until a particular transaction commits, we could teach the recovery code to look for the commit record for that XID and then pause at that point (or just before that point, if someone wanted that as an alternative behavior), which would be much simpler than using this mechanism. And if you want to check whether slaves are caught up, it would probably be better to use LSN rather than commits, because you could be caught up on commits but way behind on WAL replay. I think we should be very careful about assuming that we understand replication and its needs better than someone who has spent many years developing one of the major PostgreSQL replication solutions. Well the flip side of that is that we want an interface that's useful for more than just one replication system. This is something basic enough that I think it will be useful for more than just replication if we design it generally enough. It should be useful for backup/restore processes and monitoring as well as various forms of replication including master-slave trigger based systems but also including PITR-based replication, log-parsing systems, multi-master trigger based systems, 2PC-based systems, etc. Making it general enough to serve multiple needs is good, but we've got to make sure that the extra complexity is buying us something. Jan seems pretty confident that this could be used by Londiste also, though it would be nice to have some confirmation from the Londiste developer(s) on that. I think it may also have applications for distributed transactions and multi-master replication, but I am not too sure it helps much for PITR-based replication or log-parsing systems. We want to design something that is good, but trying to solve too many problems may end up solving none of them well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Excerpts from Jan Wieck's message of jue jun 03 19:52:19 -0400 2010: On 6/3/2010 7:11 PM, Alvaro Herrera wrote: Why not send separate numbers of tuple inserts/updates/deletes, which we already have from pgstats? We only have them for the entire database. The purpose of this is just a guesstimate about what data volume to expect if I were to select all log from a particular transaction. But we already have per table counters. Couldn't we aggregate them per transaction as well, if this feature is enabled? I'm guessing that this is going to have some uses besides Slony; vague measurements could turn out to be unusable for some of these. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/4/2010 10:44 AM, Greg Stark wrote: On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote: I find the skeptical attitude on this thread altogether unwarranted. Jan made his case and, at least IMHO, presented it pretty clearly. Just to be clear I think the idea of exposing commit order is a no-brainer. The specific interface is what I was questioning. A function which takes a starting xid and a number of transactions to return seems very tied to one particular application. I could easily see other systems such as a multi-master system instead only wanting to compare two transactions to find out which committed first. Or non-replication applications where you have an LSN and want to know whether a given transaction had committed by that time. Read the proposal again. I mean the original mail that started this tread. The function does NOT take an xid as argument. Being able to compare two xid's against each other with respect to their commit order is eventually useful. The serial number of the data set, returned by the SRF as proposed, would perfectly satisfy that need. But not the way you envision for multimaster. Multimaster would ask did xid X from server A commit before or after xid Y from server B? That is a question completely outside the scope of this proposal. Please keep it real. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/4/2010 12:52 PM, Alvaro Herrera wrote: Excerpts from Jan Wieck's message of jue jun 03 19:52:19 -0400 2010: On 6/3/2010 7:11 PM, Alvaro Herrera wrote: Why not send separate numbers of tuple inserts/updates/deletes, which we already have from pgstats? We only have them for the entire database. The purpose of this is just a guesstimate about what data volume to expect if I were to select all log from a particular transaction. But we already have per table counters. Couldn't we aggregate them per transaction as well, if this feature is enabled? I'm guessing that this is going to have some uses besides Slony; vague measurements could turn out to be unusable for some of these. We have them per table and per index, summarized over all transactions. It is debatable if bloating this feature with detailed statistics is useful or not, but I'd rather not have that bloat at the beginning, because otherwise I know exactly what is going to happen. People will just come back and say zero impact my a... Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/2/2010 7:49 PM, Greg Stark wrote: On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote: It would make it easy to conclude: This next transaction did 8328194 updates. Maybe we should do some kind of checkpoint (e.g. - commit transaction or such) before working on it. versus This transaction we're thinking of working on had 7 updates. No big deal... I'm puzzled how you would define this value. How do you add 7 inserts, 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7 inserts and 7 deletes worth twice as much as the 7 updates when they're basically the same thing? What if the inserts fired triggers which inserted 7 more rows, is that 14? What if the 7 updates modified 2 TB of TOAST data but the 8238194 updates were all to the same record and they were all HOT updates so all it did was change 8kB? In any case you'll have all the actual data from your triggers or hooks or whatever so what value does having the system keep track of this add? The point is not that we don't have that information now. The point is having a hint BEFORE wading through possibly gigabytes of WAL or log data. If getting that information requires to read all the log data twice or the need to read gigabytes of otherwise useless WAL data (as per Bruce's suggestion), we better not get it at all and just keep doing what we are doing now. I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: The point is not that we don't have that information now. The point is having a hint BEFORE wading through possibly gigabytes of WAL or log data. If getting that information requires to read all the log data twice or the need to read gigabytes of otherwise useless WAL data (as per Bruce's suggestion), we better not get it at all and just keep doing what we are doing now. I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? FWIW, once I came to understand the use case, it seems to me a perfectly reasonable and useful thing to have. It does strike me that there may be value to add one more xid to support certain types of integrity for some use cases, but that's certainly something which could be added later, if at all. Once I realized that, I just dropped out of the discussion; perhaps I should have bowed out with an endorsement. Unless my memory is failing me worse than usual, Dan Ports, who is working on the serializable implementation so he can use the predicate locking with a transaction-aware caching feature, needs the ability to track commit order of transactions by xid; so the use cases go beyond Slony and Londiste. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/3/2010 4:04 PM, Bruce Momjian wrote: If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. That comment was rather unprofessional. I think the rest of us still try to find the best solution for the problem, not kill the discussion. You may want to rejoin that effort. I care about an efficient, low overhead way to get a certain information, that is otherwise extremely difficult, expensive and version dependent to get. I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
br...@momjian.us (Bruce Momjian) writes: Jan Wieck wrote: The point is not that we don't have that information now. The point is having a hint BEFORE wading through possibly gigabytes of WAL or log data. If getting that information requires to read all the log data twice or the need to read gigabytes of otherwise useless WAL data (as per Bruce's suggestion), we better not get it at all and just keep doing what we are doing now. I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. Are you caring or opposing? It seems rather uncharitable to imply that Jan doesn't care. I know *I'm* not interested in a forked Postgres for this - I would prefer to find out what things could be done that don't involve gross amounts of WAL file grovelling for data that mayn't necessarily even be available. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
gsst...@mit.edu (Greg Stark) writes: On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote: It would make it easy to conclude: This next transaction did 8328194 updates. Maybe we should do some kind of checkpoint (e.g. - commit transaction or such) before working on it. versus This transaction we're thinking of working on had 7 updates. No big deal... I'm puzzled how you would define this value. How do you add 7 inserts, 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7 inserts and 7 deletes worth twice as much as the 7 updates when they're basically the same thing? What if the inserts fired triggers which inserted 7 more rows, is that 14? What if the 7 updates modified 2 TB of TOAST data but the 8238194 updates were all to the same record and they were all HOT updates so all it did was change 8kB? The presence of those questions (and their ambiguity) is the reason why there's a little squirming as to whether this is super-useful and super-necessary. What this offers is *SOME* idea of how much updating work a particular transaction did. It's a bit worse than you suggest: - If replication triggers have captured tuples, those would get counted. - TOAST updates might lead to extra updates being counted. But back to where you started, I'd anticipate 7 inserts, 7 deletes, and 7 updates being counted as something around 21 updates. And if that included 5 TOAST changes, it might bump up to 26. If there were replication triggers in place, that might bump the count up to 45 (which I chose arbitrarily). In any case you'll have all the actual data from your triggers or hooks or whatever so what value does having the system keep track of this add? This means that when we'd pull the list of transactions to consider, we'd get something like: select * from next_transactions('4218:23', 50); [list of 50 transactions returned, each with... - txid - START timestamp - COMMIT timestamp - Approximate # of updates Then, for each of the 50, I'd pull replication log data for the corresponding transaction. If I have the approximate # of updates, that might lead me to stop short, and say: That next update looks like a doozy! I'm going to stop and commit what I've got before doing that one. It's not strictly necessary, but would surely be useful for flow control. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck janwi...@yahoo.com wrote: I'm puzzled how you would define this value. How do you add 7 inserts, 7 deletes, and 7 updates? Is that 21 rows modified? I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? The post you were responding to was regarding the meaninglessness of the number of records attribute you wanted. Your response is a non sequitor. I think the commit order of transactions would be a good thing to expose though I've asked repeatedly what kind of interface you need and never gotten answers to all the questions. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Greg Stark gsst...@mit.edu wrote: what kind of interface you need For the potential uses I can see, it would be great to have a SRF which took two parameters: xid of last known commit and a limit how many commits past that to return. Perhaps a negative number could move earlier in time, if that seems reasonable to others. I think that's also consistent with Jan's posts. A GUC to enable it and some way to specify retention (or force cleanup) are the only other user-facing features which come to mind for me. (Not sure what form that last should take, but didn't Jan say something about both of these early in the thread?) Do you see a need for something else (besides, obviously, docs)? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Kevin Grittner kevin.gritt...@wicourts.gov wrote: Greg Stark gsst...@mit.edu wrote: what kind of interface you need For the potential uses I can see, it would be great to have a SRF which took two parameters: xid of last known commit and a limit how many commits past that to return. Jan's very first post had it right; my idea was flawed: | Exposing the data will be done via a set returning function. The | SRF takes two arguments. The maximum number of rows to return and | the last serial number processed by the reader. The advantage of | such SRF is that the result can be used in a query that right away | delivers audit or replication log information in transaction | commit order. The SRF can return an empty set if no further | transactions have committed since, or an error if data segments | needed to answer the request have already been purged. | | Purging of the data will be possible in several different ways. | Autovacuum will call a function that drops segments of the data | that are outside the postgresql.conf configuration with respect to | maximum age or data volume. There will also be a function reserved | for superusers to explicitly purge the data up to a certain serial | number. Apologies for not looking back to the start of the thread before that last post. It was all laid out right at the start. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: On 6/3/2010 4:04 PM, Bruce Momjian wrote: If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. That comment was rather unprofessional. I think the rest of us still try to find the best solution for the problem, not kill the discussion. You may want to rejoin that effort. I care about an efficient, low overhead way to get a certain information, that is otherwise extremely difficult, expensive and version dependent to get. I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Here is what I was replying to: I actually have a hard time understanding why people are so opposed t$ feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? Jan's comment is why should others care what he wants because it has zero impact? I am saying the community cares because we have to maintain the code. I stand by my comment. I remember a dismissive comment by Jan when 'session_replication_role' was added, and a similar strong comment from me at that time as well. It seems we are doing this again. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Bruce Momjian wrote: Jan Wieck wrote: On 6/3/2010 4:04 PM, Bruce Momjian wrote: If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. That comment was rather unprofessional. I think the rest of us still try to find the best solution for the problem, not kill the discussion. You may want to rejoin that effort. I care about an efficient, low overhead way to get a certain information, that is otherwise extremely difficult, expensive and version dependent to get. I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Here is what I was replying to: I actually have a hard time understanding why people are so opposed t$ feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? Jan's comment is why should others care what he wants because it has zero impact? I am saying the community cares because we have to maintain the code. I stand by my comment. I remember a dismissive comment by Jan when 'session_replication_role' was added, and a similar strong comment from me at that time as well. It seems we are doing this again. Of course, if I am misintepreting what Jan said, please let me know. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Excerpts from Chris Browne's message of jue jun 03 16:21:35 -0400 2010: What this offers is *SOME* idea of how much updating work a particular transaction did. It's a bit worse than you suggest: - If replication triggers have captured tuples, those would get counted. - TOAST updates might lead to extra updates being counted. But back to where you started, I'd anticipate 7 inserts, 7 deletes, and 7 updates being counted as something around 21 updates. And if that included 5 TOAST changes, it might bump up to 26. If there were replication triggers in place, that might bump the count up to 45 (which I chose arbitrarily). Why not send separate numbers of tuple inserts/updates/deletes, which we already have from pgstats? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/3/2010 5:58 PM, Greg Stark wrote: On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck janwi...@yahoo.com wrote: I'm puzzled how you would define this value. How do you add 7 inserts, 7 deletes, and 7 updates? Is that 21 rows modified? I actually have a hard time understanding why people are so opposed to a feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? The post you were responding to was regarding the meaninglessness of the number of records attribute you wanted. Your response is a non sequitor. I never proposed a number of records attribute. I proposed a sum of the row counts in the statistics collector. That row count would be a mix of insert, update, delete and toast operations. It's not an exact indicator of anything, but a good enough hint of how much data may come down the pipe if I were to select all replication data belonging to that transaction. I think the commit order of transactions would be a good thing to expose though I've asked repeatedly what kind of interface you need and never gotten answers to all the questions. In the original email that started this whole thread I wrote: Exposing the data will be done via a set returning function. The SRF takes two arguments. The maximum number of rows to return and the last serial number processed by the reader. The advantage of such SRF is that the result can be used in a query that right away delivers audit or replication log information in transaction commit order. The SRF can return an empty set if no further transactions have committed since, or an error if data segments needed to answer the request have already been purged. Did that not answer your question? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/3/2010 6:24 PM, Kevin Grittner wrote: Apologies for not looking back to the start of the thread before that last post. It was all laid out right at the start. No need to apologize. Happens. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/3/2010 7:11 PM, Alvaro Herrera wrote: Excerpts from Chris Browne's message of jue jun 03 16:21:35 -0400 2010: What this offers is *SOME* idea of how much updating work a particular transaction did. It's a bit worse than you suggest: - If replication triggers have captured tuples, those would get counted. - TOAST updates might lead to extra updates being counted. But back to where you started, I'd anticipate 7 inserts, 7 deletes, and 7 updates being counted as something around 21 updates. And if that included 5 TOAST changes, it might bump up to 26. If there were replication triggers in place, that might bump the count up to 45 (which I chose arbitrarily). Why not send separate numbers of tuple inserts/updates/deletes, which we already have from pgstats? We only have them for the entire database. The purpose of this is just a guesstimate about what data volume to expect if I were to select all log from a particular transaction. This datum isn't critical, just handy for the overall feature to be useful. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Thu, Jun 3, 2010 at 6:29 PM, Bruce Momjian br...@momjian.us wrote: Jan Wieck wrote: On 6/3/2010 4:04 PM, Bruce Momjian wrote: If you want to fork Postgres and add it, go ahead, but if the community has to maintain the code and document it, we care. That comment was rather unprofessional. I think the rest of us still try to find the best solution for the problem, not kill the discussion. You may want to rejoin that effort. I care about an efficient, low overhead way to get a certain information, that is otherwise extremely difficult, expensive and version dependent to get. I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Here is what I was replying to: I actually have a hard time understanding why people are so opposed t$ feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? Jan's comment is why should others care what he wants because it has zero impact? I am saying the community cares because we have to maintain the code. I stand by my comment. I remember a dismissive comment by Jan when 'session_replication_role' was added, and a similar strong comment from me at that time as well. It seems we are doing this again. I think it's entirely legitimate and proper for us to make a decision about whether this feature is worth including in core PostgreSQL. We are obviously not in the business of adding random features solely for the benefit of third-party applications. That having been said, there are several reasons why I believe that this particular feature is an excellent candidate for inclusion in core. 1. It solves a problem for which there is no easy workaround. Rereading all the WAL to extract the commit records is not an easy workaround, nor is what Slony and Londiste are doing now. 2. It is usable by multiple projects, not just one. It may well have applications beyond replication (e.g. distributed transactions), but at a very minimum it is usable by and useful to multiple replication solutions. 3. It has a clear specification which can be easily understood even by people who do not fully understand how replication solutions will make use of it, which makes code maintenance much less burdensome. Obviously, Jan's original email on this topic was just a sketch, but I find it to be pretty clear. 4. We have an existing precedent of being willing to add limited support into core to allow replication solutions to do their thing (session_replication_role, ALTER TABLE ... ENABLE REPLICA TRIGGER, etc). Even though we now have built-in replication via HS and SR, there is still a BIG use case for Slony, Londiste, and other add-on tools. Making those tools more successful and performant is good for PostgreSQL. 5. It does not involve highly invasive changes to the core code. 6. It can be turned off for users who don't want it. I find the skeptical attitude on this thread altogether unwarranted. Jan made his case and, at least IMHO, presented it pretty clearly. He then answered, multiple times, numerous questions which were already addressed in the original email, as well as various others. I think we should be very careful about assuming that we understand replication and its needs better than someone who has spent many years developing one of the major PostgreSQL replication solutions. Independent of Jan's qualifications, there are clearly several people on this thread who understand why this is useful and valuable, including me. I am obviously not in a position to insist that we accept this feature (assuming Jan produces a patch rather than getting discouraged and giving up) but I would like us to think very, very carefully before rejecting it, and not to do so unless we have a DARN good reason. Most patches add code, and therefore require code maintenance - that is not, by itself, a reason to reject them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Bruce Momjian wrote: I care about cleaning up more of the mistakes, made in the original development of Slony. Namely using hacks and kluges to implement details, not supported by a current version of PostgreSQL. Londiste and Slony made a good leap on that with the txid data type. Slony made another step like that with 2.0, switching to the (for that very purpose developed and contributed) native trigger configuration instead of hacking system catalogs. This would be another step in that direction and we would be able to unify Londiste's and Slony's transport mechanism and eliminating the tick/sync kluge. Care to explain what exactly you care about? Here is what I was replying to: I actually have a hard time understanding why people are so opposed t$ feature that has zero impact at all unless a DBA actually turns in ON. What is the problem with exposing the commit order of transactions? Jan's comment is why should others care what he wants because it has zero impact? I am saying the community cares because we have to maintain the code. I stand by my comment. I remember a dismissive comment by Jan when 'session_replication_role' was added, and a similar strong comment from me at that time as well. It seems we are doing this again. FYI, I talked to Jan on the phone and we have resolved this issue. :-) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
d...@csail.mit.edu (Dan Ports) writes: I'm not clear on why the total rowcount is useful, but perhaps I'm missing something obvious. It would make it easy to conclude: This next transaction did 8328194 updates. Maybe we should do some kind of checkpoint (e.g. - commit transaction or such) before working on it. versus This transaction we're thinking of working on had 7 updates. No big deal... -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://linuxfinances.info/info/finances.html Rules of the Evil Overlord #189. I will never tell the hero Yes I was the one who did it, but you'll never be able to prove it to that incompetent old fool. Chances are, that incompetent old fool is standing behind the curtain. http://www.eviloverlord.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
heikki.linnakan...@enterprisedb.com (Heikki Linnakangas) writes: On 24/05/10 19:51, Kevin Grittner wrote: The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. Right, it doesn't. What it provides is a way to reconstruct a snapshot at any point in time, after the fact. For example, after transactions A, C, D and B have committed in that order, it allows you to reconstruct a snapshot just like you would've gotten immediately after the commit of A, C, D and B respectively. That's useful replication tools like Slony that needs to commit the changes of those transactions in the slave in the same order as they were committed in the master. I don't know enough of Slony et al. to understand why that'd be better than the current heartbeat mechanism they use, taking a snapshot every few seconds, batching commits. I see two advantages: a) Identifying things on a transaction-by-transaction basis means that the snapshots (syncs) don't need to be captured, which is presently an area of fragility. If the slon daemon falls over on Friday evening, and nobody notices until Monday, the snapshot reverts to being all updates between Friday and whenever SYNCs start to be collected again. Exposing commit orders eliminates that fragility. SYNCs don't need to be captured anymore, so they can't be missed (which is today's problem). b) The sequence currently used to control log application ordering is a bottleneck, as it is a single sequence shared across all connections. It could be eliminated in favor of (perhaps) an in-memory variable defined on a per-connection basis. It's not a bottleneck that we hear a lot of complaints about, but the sequence certainly is a bottleneck. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote: It would make it easy to conclude: This next transaction did 8328194 updates. Maybe we should do some kind of checkpoint (e.g. - commit transaction or such) before working on it. versus This transaction we're thinking of working on had 7 updates. No big deal... I'm puzzled how you would define this value. How do you add 7 inserts, 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7 inserts and 7 deletes worth twice as much as the 7 updates when they're basically the same thing? What if the inserts fired triggers which inserted 7 more rows, is that 14? What if the 7 updates modified 2 TB of TOAST data but the 8238194 updates were all to the same record and they were all HOT updates so all it did was change 8kB? In any case you'll have all the actual data from your triggers or hooks or whatever so what value does having the system keep track of this add? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/28/2010 7:19 PM, Bruce Momjian wrote: Jan Wieck wrote: Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. Stripping it out from what? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. Stripping it out from what? Stripping it from the WAL. Your system seems to require double-writes on a commit, which is something we have avoided in the past. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 6/1/2010 11:09 AM, Bruce Momjian wrote: Jan Wieck wrote: I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. Stripping it out from what? Stripping it from the WAL. Your system seems to require double-writes on a commit, which is something we have avoided in the past. Your suggestion seems is based on several false assumptions. This does neither require additional physical writes on commit, nor is consuming the entire WAL just to filter out commit records anything even remotely desirable for systems like Londiste or Slony. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 28, 2010, at 7:19 PM, Bruce Momjian br...@momjian.us wrote: Jan Wieck wrote: Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? I suggest writing an external tool that strips out what you need that can be run at any time, rather than creating a new data format and overhead for this usecase. That would be FAR more complex, less robust, and less performant - whereas doing what Jan has proposed is pretty straightforward and should have minimal impact on performance - or none when not enabled. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: On 5/26/2010 4:34 PM, Kevin Grittner wrote: My latest idea for handling this in WAL-based replication involves WAL-logging information about the transaction through which a the committing transaction makes it safe to view. There are a few options here at the detail level that I'm still thinking through. The idea would be that the xmin from read-only queries on the slaves might be somewhere behind where you would expect based on transactions committed. (The details involve such things as where non-serializable transactions fall into the plan on both sides, and whether it's worth the effort to special-case read-only transactions on the master.) I can't say that I'm 100% sure that some lurking detail won't shoot this technique down for HS, but it seems good to me at a conceptual level. Without simulating multiple simultaneous transactions during playback, how are you going to manage that the tuples, already inserted on behalf of the ongoing master transactions, disappear when they abort on the master? When do writes ever become visible to a snapshot without having been committed? I'm not talking about changing that in any way. I'm talking about deferring visibility of committed transactions until they can be viewed without risking serialization anomalies. This requires, at a minimum, that any concurrent serializable transactions which are not read-only have completed. (Perhaps I'm not understanding your question) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote: How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? This thread has been hard to follow for me. Were any of these questions answered? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/27/2010 9:59 AM, Greg Stark wrote: On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote: How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? This thread has been hard to follow for me. Were any of these questions answered? Yes. On 5/26/2010 4:49 PM, Jan Wieck wrote: On 5/26/2010 12:38 PM, Greg Stark wrote: On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote: ... but to answer that request, actually I don't even think we should be discussing API specifics. How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? The question I want answered is what was the order and xid of the next 0..n transactions, that committed after transaction X? Preferably I would avoid scanning the entire available WAL just to get the next n xid's to process. The proposal assigned a unique serial number (file segment and position driven) to each xid and used that for the ordering as well as identification of the last known transaction. That is certainly a premature implementation detail. In this implementation it wouldn't even matter if a transaction that was recorded actually never made it because it crashed before the WAL flush. It would be reported by this commit order feature, but there would be no traces of whatever it did to be found inside the DB, so that anomaly is harmless. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/27/2010 12:01 PM, Jan Wieck wrote: On 5/27/2010 9:59 AM, Greg Stark wrote: This thread has been hard to follow for me. Were any of these questions answered? Yes. The thing missing is any sort of answer to that problem description. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 3:18 PM, Kevin Grittner wrote: Jan Wieck janwi...@yahoo.com wrote: Have you ever looked at one of those queries, that Londiste or Slony issue against the provider DB in order to get all the log data that has been committed between two snapshots? Is that really the best you can think of? No, I admit I haven't. In fact, I was thinking primarily in terms of log-driven situations, like HS. What would be the best place for me to look to come up to speed on your use case? (I'm relatively sure that the issue isn't that there's no information to find, but that a sequential pass over all available information would take a *long* time.) I've been working through the issues on WAL-based replicas, and have some additional ideas and alternatives, but I'd like to see the big picture, including trigger-based replication, before posting. In short, what both systems are doing is as follows. An AFTER ROW trigger records the OLD PK and all changed columns, as well as the txid and a global, not cached serial number. Some background process periodically starts a serializable transaction and records the resulting snapshot. To replicate from one consistent state to the next, the replication system now selects all log rows between two snapshots. Between here means it simulates MVCC visibility in the sense of that the writing transaction was in progress when the first snapshot was taken and had committed at the second. The resulting WHERE clause looks something like WHERE (xid s1.xmax OR (xid = s1.xmin AND xid IN (s1.xip))) AND (xid s2.xmin OR (xid = s2.xmax AND xid NOT IN (s2.xip))) Note that xip here is a comma separated list of txid's. I think it is easy to see that this is not a cheap query. Anyhow, that set of log rows is now ordered by the serial number and applied to the replica. Without this logic, the replication system could not combine multiple origin sessions into one replication session without risking to never find a state, in which it can commit. It may be possible to work with two sessions on the replica and not require any reordering of the original actions at all. I need to think about that for a little longer since this idea just occurred to me a second ago. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 7:03 AM, Jan Wieck wrote: To replicate from one consistent state to the next, the replication system now selects all log rows between two snapshots. Between here means it simulates MVCC visibility in the sense of that the writing transaction was in progress when the first snapshot was taken and had committed at the second. The resulting WHERE clause looks something like Or it entirely happened between the snapshots, obviously. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, May 23, 2010 at 9:21 PM, Jan Wieck janwi...@yahoo.com wrote: Each record of the Transaction Commit Info consists of txid xci_transaction_id timestamptz xci_begin_timestamp timestamptz xci_commit_timestamp int64 xci_total_rowcount So I think you're going about this backwards. Instead of discussing implementation I think you should start with the API the replication system needs. In particular I'm not sure you really want a server-side query at all. I'm wondering if you wouldn't be better off with a public machine-parsable text format version of the WAL. Ie, at the same time as writing out all the nitty gritty to the binary wal we would write out a summary of public data to an xml version containing just parts of the data stream that we can promise won't change, such as transaction id, lsn, timestamp. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 10:04 AM, Greg Stark wrote: On Sun, May 23, 2010 at 9:21 PM, Jan Wieck janwi...@yahoo.com wrote: Each record of the Transaction Commit Info consists of txid xci_transaction_id timestamptz xci_begin_timestamp timestamptz xci_commit_timestamp int64 xci_total_rowcount So I think you're going about this backward Instead of discussing implementation I think you should start with the API the replication system needs. In particular I'm not sure you really want a server-side query at all. I'm wondering if you wouldn't be better off with a public machine-parsable text format version of the WAL. Ie, at the same time as writing out all the nitty gritty to the binary wal we would write out a summary of public data to an xml version containing just parts of the data stream that we can promise won't change, such as transaction id, lsn, timestamp. Since the actual row level change information and other event data is found inside of regular tables, identified by TXID and sequence number, I am pretty sure I want that data in a server-side query. What you are proposing is to read the xid's and timestamps with an external process, that now forcibly needs to reside on the DB server itself (neither Londiste nor Slony have that requirement as of today), then bring it back into the DB at least inside the WHERE clause of a query. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: Since the actual row level change information and other event data is found inside of regular tables, identified by TXID and sequence number, I am pretty sure I want that data in a server-side query. What you are proposing is to read the xid's and timestamps with an external process, that now forcibly needs to reside on the DB server itself (neither Londiste nor Slony have that requirement as of today), then bring it back into the DB at least inside the WHERE clause of a query. It depends on how you approach the problem. If you had a process that could scan WAL files (or a platform/version independent representation of these WAL files) you could run that process on any server (the origin server, a replica, or some third server with the software installed). Where you run it involves making trade-offs on the costs of storing transferring and processing the files and would ideally be configurable. You could then have a process that transfers all of the data logged by the triggers to the replicas as soon as it is committed. Basically saying 'copy any rows in sl_log from the origin to the replica that we haven't already sent to that replica' You could then move the work of figuring out the commit order onto the replica where you would combine the output of the WAL scanning process with the transaction data that has been copied to the replica. Jan -- Steve Singer Afilias Canada Data Services Developer 416-673-1142 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, May 26, 2010 at 11:43 AM, Steve Singer ssin...@ca.afilias.info wrote: Jan Wieck wrote: Since the actual row level change information and other event data is found inside of regular tables, identified by TXID and sequence number, I am pretty sure I want that data in a server-side query. What you are proposing is to read the xid's and timestamps with an external process, that now forcibly needs to reside on the DB server itself (neither Londiste nor Slony have that requirement as of today), then bring it back into the DB at least inside the WHERE clause of a query. It depends on how you approach the problem. If you had a process that could scan WAL files (or a platform/version independent representation of these WAL files) you could run that process on any server (the origin server, a replica, or some third server with the software installed). Where you run it involves making trade-offs on the costs of storing transferring and processing the files and would ideally be configurable. You could then have a process that transfers all of the data logged by the triggers to the replicas as soon as it is committed. Basically saying 'copy any rows in sl_log from the origin to the replica that we haven't already sent to that replica' You could then move the work of figuring out the commit order onto the replica where you would combine the output of the WAL scanning process with the transaction data that has been copied to the replica. I'm sure it's possible to make this work however you want to do it, but I don't really see what advantage Greg Stark's proposal has over Jan's original proposal. Recording the commits in one extra place at commit time is practically free, especially compared to the overall cost of replication. Rescanning the WAL seems likely to be much more expensive and potentially introduces more failure paths. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 10:04 AM, Greg Stark wrote: Instead of discussing implementation I think you should start with the API the replication system needs. ... but to answer that request, actually I don't even think we should be discussing API specifics. During PGCon, Marco Kreen, Jim Nasby and I were discussing what the requirements of a unified message queue, shared by Londiste and Slony may look like. For some use cases of pgq, there isn't even any interest in user table changes. These are simply a reliable, database backed message passing system. Today both systems use an agreeable order of changes selected by rather expensive queries based on serializable snapshot information and a global, non cacheable serial number. This could be replaced with a logic based on the actual commit order of the transactions. This order does not need to be 100% accurate. As long as the order is recorded after all user actions have been performed (trigger queue shut down) and while the transaction is still holding onto its locks, that order is good enough. This will not allow a conflicting transaction, waiting on locks to be released, to appear having committed before the lock conflict winner. It is obvious that in cases where only small portions or even none of the user table changes are needed, holding on to or even parsing the ENTIRE WAL sounds suboptimal for this use case. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote: ... but to answer that request, actually I don't even think we should be discussing API specifics. How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Could you generate the commit-order log by simply registering a commit hook (RegisterXactCallback(XACT_EVENT_COMMIT)) that writes such a log somewhere in the data directory? That would work with older versions too, no server changes required. It would not get called during recovery, but I believe that would be sufficient for Slony. You could always batch commits that you don't know when they committed as if they committed simultaneously. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 1:17 PM, Heikki Linnakangas wrote: Could you generate the commit-order log by simply registering a commit hook (RegisterXactCallback(XACT_EVENT_COMMIT)) that writes such a log somewhere in the data directory? That would work with older versions too, no server changes required. That would work, as it seems that the backend keeps holding on to its locks until after calling the callbacks. It would not get called during recovery, but I believe that would be sufficient for Slony. You could always batch commits that you don't know when they committed as if they committed simultaneously. Here you are mistaken. If the origin crashes but can recover not yet flushed to xlog-commit-order transactions, then the consumer has no idea about the order of those commits, which throws us back to the point where we require a non cacheable global sequence to replay the individual actions of those now batched transactions in an agreeable order. The commit order data needs to be covered by crash recovery. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 26/05/10 21:43, Jan Wieck wrote: On 5/26/2010 1:17 PM, Heikki Linnakangas wrote: It would not get called during recovery, but I believe that would be sufficient for Slony. You could always batch commits that you don't know when they committed as if they committed simultaneously. Here you are mistaken. If the origin crashes but can recover not yet flushed to xlog-commit-order transactions, then the consumer has no idea about the order of those commits, which throws us back to the point where we require a non cacheable global sequence to replay the individual actions of those now batched transactions in an agreeable order. The commit order data needs to be covered by crash recovery. Perhaps I'm missing something, but I thought that Slony currently uses a heartbeat, and all transactions committed between two beats are banged together and committed as one in the slave so that their relative commit order doesn't matter. Can we not do the same for commits missing from the commit-order log? I'm thinking that the commit-order log would contain two kinds of records: a) Transaction with XID X committed b) All transactions with XID X committed During normal operation we write the 1st kind of record at every commit. After crash recovery (perhaps at the first commit after recovery or when the slon daemon first polls the server, as there's no hook for end-of-recovery), we write the 2nd kind of record. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Perhaps I'm missing something, but I thought that Slony currently uses a heartbeat, and all transactions committed between two beats are banged together and committed as one in the slave so that their relative commit order doesn't matter. I guess Slony does the same as pgq here: all events of all those transactions between two given ticks are batched together in the order of the event commits. (In fact the batches are made at the consumer request, so possibly spreading more than 2 ticks at a time). If you skip that event ordering (within transactions), you can't maintain foreign keys on the slaves, among other things. The idea of this proposal is to be able to get this commit order directly from where the information is maintained, rather than use some sort of user sequence for that. So even ordering the txid and txid_snapshots with respect to WAL commit time (LSN) won't be the whole story, for any given transaction containing more than one event we also need to have them in order. I know Jan didn't forget about it so it must either be in the proposal or easily derived, too tired to recheck. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Wed, May 26, 2010 at 4:11 PM, Dimitri Fontaine dfonta...@hi-media.com wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Perhaps I'm missing something, but I thought that Slony currently uses a heartbeat, and all transactions committed between two beats are banged together and committed as one in the slave so that their relative commit order doesn't matter. I guess Slony does the same as pgq here: all events of all those transactions between two given ticks are batched together in the order of the event commits. (In fact the batches are made at the consumer request, so possibly spreading more than 2 ticks at a time). If you skip that event ordering (within transactions), you can't maintain foreign keys on the slaves, among other things. The idea of this proposal is to be able to get this commit order directly from where the information is maintained, rather than use some sort of user sequence for that. Exactly. So even ordering the txid and txid_snapshots with respect to WAL commit time (LSN) won't be the whole story, for any given transaction containing more than one event we also need to have them in order. I know Jan didn't forget about it so it must either be in the proposal or easily derived, too tired to recheck. Right, so the point is - with this proposal, he can switch to using a LOCAL sequence number to order events within the session and then order the sessions using the commit ordering. Right now, he has to use a GLOBAL sequence number because there's no way to know the commit order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 3:16 PM, Heikki Linnakangas wrote: On 26/05/10 21:43, Jan Wieck wrote: On 5/26/2010 1:17 PM, Heikki Linnakangas wrote: It would not get called during recovery, but I believe that would be sufficient for Slony. You could always batch commits that you don't know when they committed as if they committed simultaneously. Here you are mistaken. If the origin crashes but can recover not yet flushed to xlog-commit-order transactions, then the consumer has no idea about the order of those commits, which throws us back to the point where we require a non cacheable global sequence to replay the individual actions of those now batched transactions in an agreeable order. The commit order data needs to be covered by crash recovery. Perhaps I'm missing something, Apparently, more about that at the end. I'm thinking that the commit-order log would contain two kinds of records: a) Transaction with XID X committed b) All transactions with XID X committed If that was true then long running transactions would delay all commits for transactions that started after them. Do they? During normal operation we write the 1st kind of record at every commit. After crash recovery (perhaps at the first commit after recovery or when the slon daemon first polls the server, as there's no hook for end-of-recovery), we write the 2nd kind of record. I think the callback is also called during backend startup, which means that it could record the first XID to come which is known from the control file and in that case, all XID's are committed or aborted. Which leads us to your missing piece above, the need for the global non cacheable sequence. Consider two transactions A and B that due to transaction batching between snapshots get applied together. Let the order of actions be 1. A starts 2. B starts 3. B selects a row for update, then updates the row 4. A tries to do the same and blocks 5. B commits 6. A gets the lock, the row, does the update 7. A commits If Slony (or Londiste) would not record the exact order of those individual row actions, then it would not have any idea if within that batch the action of B (higher XID) actually came first. Without that knowledge there is a 50/50 chance of getting your replica out of sync with that simple conflict. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: Without this logic, the replication system could not combine multiple origin sessions into one replication session without risking to never find a state, in which it can commit. My latest idea for handling this in WAL-based replication involves WAL-logging information about the transaction through which a the committing transaction makes it safe to view. There are a few options here at the detail level that I'm still thinking through. The idea would be that the xmin from read-only queries on the slaves might be somewhere behind where you would expect based on transactions committed. (The details involve such things as where non-serializable transactions fall into the plan on both sides, and whether it's worth the effort to special-case read-only transactions on the master.) I can't say that I'm 100% sure that some lurking detail won't shoot this technique down for HS, but it seems good to me at a conceptual level. I think, however, that this fails to work for systems like Slony and Londiste because there could be transactions writing to tables which are not replication targets, so the snapshot adjustments wouldn't be safe. True? (If not true, I think that adding some sort of xmin value, depending on the answers to the above questions, to Jan's proposed structure might support better transactional integrity, even to the level of full serializable support, at the cost of delaying visibility of committed data.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 4:11 PM, Dimitri Fontaine wrote: So even ordering the txid and txid_snapshots with respect to WAL commit time (LSN) won't be the whole story, for any given transaction containing more than one event we also need to have them in order. I know Jan didn't forget about it so it must either be in the proposal or easily derived, too tired to recheck. No, that detail is actually not explained in the proposal. When applying all changes in transaction commit order, there is no need for a global sequence. A local counter per backend is sufficient because the total order of xact-commit-order, local-xact-seq yields a similarly agreeable order of actions. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 26/05/10 23:32, Jan Wieck wrote: Consider two transactions A and B that due to transaction batching between snapshots get applied together. Let the order of actions be 1. A starts 2. B starts 3. B selects a row for update, then updates the row 4. A tries to do the same and blocks 5. B commits 6. A gets the lock, the row, does the update 7. A commits If Slony (or Londiste) would not record the exact order of those individual row actions, then it would not have any idea if within that batch the action of B (higher XID) actually came first. Without that knowledge there is a 50/50 chance of getting your replica out of sync with that simple conflict. Hmm, I don't see how even a fully reliable WAL-logged commit-order log would save you then. It seems that you need to not only know the relative order of commits, but the order of commits relative to actions within the transactions. I.e. in the above example it's not enough to know that B committed before A, you also have to know that A updated the row only after B committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 12:38 PM, Greg Stark wrote: On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote: ... but to answer that request, actually I don't even think we should be discussing API specifics. How about just API generalities? Like, where do you need this data, on the master or on the slave? Would PGXC like it on the transaction coordinator? What question do you need to answer, do you need to pull out sets of commits in certain ranges or look up specific transaction ids and find out when they committed? Or do you only need to answer which of two transaction ids committed first? The question I want answered is what was the order and xid of the next 0..n transactions, that committed after transaction X? Preferably I would avoid scanning the entire available WAL just to get the next n xid's to process. The proposal assigned a unique serial number (file segment and position driven) to each xid and used that for the ordering as well as identification of the last known transaction. That is certainly a premature implementation detail. In this implementation it wouldn't even matter if a transaction that was recorded actually never made it because it crashed before the WAL flush. It would be reported by this commit order feature, but there would be no traces of whatever it did to be found inside the DB, so that anomaly is harmless. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 26/05/10 23:45, Heikki Linnakangas wrote: On 26/05/10 23:32, Jan Wieck wrote: Consider two transactions A and B that due to transaction batching between snapshots get applied together. Let the order of actions be 1. A starts 2. B starts 3. B selects a row for update, then updates the row 4. A tries to do the same and blocks 5. B commits 6. A gets the lock, the row, does the update 7. A commits If Slony (or Londiste) would not record the exact order of those individual row actions, then it would not have any idea if within that batch the action of B (higher XID) actually came first. Without that knowledge there is a 50/50 chance of getting your replica out of sync with that simple conflict. Hmm, I don't see how even a fully reliable WAL-logged commit-order log would save you then. It seems that you need to not only know the relative order of commits, but the order of commits relative to actions within the transactions. I.e. in the above example it's not enough to know that B committed before A, you also have to know that A updated the row only after B committed. Ok, I think I understand it now. The commit order is enough, because replaying the actions in the order all actions of B, then all actions of A yields the same result. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 26/05/10 23:49, Jan Wieck wrote: In this implementation it wouldn't even matter if a transaction that was recorded actually never made it because it crashed before the WAL flush. It would be reported by this commit order feature, but there would be no traces of whatever it did to be found inside the DB, so that anomaly is harmless. Hmm, I think it would also not matter if the reported commit order doesn't match exactly the order of the commit records, as long as there's no dependency between the two transactions. What I'm after is that I think it would be enough to establish the commit order using deferred triggers that are fired during pre-commit processing. The trigger could get a number from a global sequence to establish the commit order, and write it to a table. So you still need a global sequence, but it's only needed once per commit. (you have to handle deferred triggers that fire after the commit-order trigger. perhaps by getting another number from the global sequence and replacing the previous number with it) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 4:52 PM, Heikki Linnakangas wrote: Ok, I think I understand it now. The commit order is enough, because replaying the actions in the order all actions of B, then all actions of A yields the same result. Precisely. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 5:12 PM, Heikki Linnakangas wrote: On 26/05/10 23:49, Jan Wieck wrote: In this implementation it wouldn't even matter if a transaction that was recorded actually never made it because it crashed before the WAL flush. It would be reported by this commit order feature, but there would be no traces of whatever it did to be found inside the DB, so that anomaly is harmless. Hmm, I think it would also not matter if the reported commit order doesn't match exactly the order of the commit records, as long as there's no dependency between the two transactions. What I'm after is that I think it would be enough to establish the commit order using deferred triggers that are fired during pre-commit processing. The trigger could get a number from a global sequence to establish the commit order, and write it to a table. So you still need a global sequence, but it's only needed once per commit. You're not trying to derail this thread into yet another of our famous commit trigger battles, are you? (you have to handle deferred triggers that fire after the commit-order trigger. perhaps by getting another number from the global sequence and replacing the previous number with it) I could imagine a commit trigger as a special case that is fired AFTER the trigger queue was shut down, so any operation that causes any further triggers to fire would automatically abort the transaction. A draconian, but reasonable restriction. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/26/2010 4:34 PM, Kevin Grittner wrote: Jan Wieck janwi...@yahoo.com wrote: Without this logic, the replication system could not combine multiple origin sessions into one replication session without risking to never find a state, in which it can commit. My latest idea for handling this in WAL-based replication involves WAL-logging information about the transaction through which a the committing transaction makes it safe to view. There are a few options here at the detail level that I'm still thinking through. The idea would be that the xmin from read-only queries on the slaves might be somewhere behind where you would expect based on transactions committed. (The details involve such things as where non-serializable transactions fall into the plan on both sides, and whether it's worth the effort to special-case read-only transactions on the master.) I can't say that I'm 100% sure that some lurking detail won't shoot this technique down for HS, but it seems good to me at a conceptual level. Without simulating multiple simultaneous transactions during playback, how are you going to manage that the tuples, already inserted on behalf of the ongoing master transactions, disappear when they abort on the master? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 3:21 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: The subtle point here is whether you consider the view from the outside (in the sense of what a read-only transaction started at an arbitrary time can or cannot observe), or from the inside (what updating transactions can observe and might base their updates on). The former case is completely determined by the commit ordering of the transactions, while the latter is not - otherwise serializability wouldn't be such a hard problem. BTW, doesn't all this logic fall in a heap as soon as you consider read-committed transactions? Why would it? There's still a well defined point in time at which the transaction's effects become visible, and every other transaction commits either before that time or after that time. An observer started between two transactions sees the first's changes but not the second's. One replace observing read committed transactions by a series of smaller repeatable read transactions, since the observers are read-only anyway. This of course says nothing about what state the updating transactions themselves see as the current state. For e.g. replication that is adequate, since you'd not replay the original commands but rather the effects they had in terms of physical tuple updates. On replay, the effects of a transaction to therefor not depend on the state the transaction sees. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports d...@csail.mit.edu: On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote: Replicating or recreating the whole predicate locking and conflict detection on slaves is not feasible for performance reasons. (I won't elaborate unless someone feels that's not intuitively obvious.) The only sane way I can see to have a slave database allow serializable behavior is to WAL-log the acquisition of a snapshot by a serializable transaction, and the rollback or commit, on the master, and to have the serializable snapshot build on a slave exclude any serializable transactions for which there are still concurrent serializable transactions. Yes, that does mean WAL- logging the snapshot acquisition even if the transaction doesn't yet have an xid, and WAL-logging the commit or rollback even if it never acquires an xid. One important observation is that any anomaly that occurs on the slave can be resolved by aborting a local read-only transaction. This is a good thing, because the alternatives are too horrible to consider. You could possibly cut the costs of predicate locking by having the master ship with each transaction the list of predicate locks it acquired. But you'd still have to track locks for read-only transactions, so maybe that's not a significant cost improvement. On the other hand, if you're willing to pay the price of serializability on the master, why not the slaves too? I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. Is anything wrong with that reasoning? Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. SSI would detect a potential conflict here, which we could resolve by aborting T3. (We could also abort T1, but if this is a replicated system this isn't always an option -- T3 might be running on the slave, so only the slave will know about the conflict, and it can't very well abort an update transaction on the master.) There's another example of a read-only transaction anomaly that could cause similar problems at http://portal.acm.org/citation.cfm?doid=1031570.1031573, but I think this one is easier to follow. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:18 , Dan Ports wrote: On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug f...@phlo.org wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Not exactly. The eventually-persisted state is always consistent, but there can be a transitory committed state which would violate user-defined constraints or business rules *if viewed*. This is what I've been on about -- the commit sequence is not necessarily the same as the apparent order of execution. A read-only transaction, if run before the overlapping commits settle, can view a state which is not consistent with any serial order of execution, and might therefore break the rules. SSI detects that and rolls one of the transactions back if they're all running at serializable transaction isolation in a single SSI database, but the question is how to handle this when the read happens in a replica. Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? At least one of the transactions participating in the cycle does. There's no requirement that they all do. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Yes. Note that even while it's in an inconsistent state, you can still perform any query that doesn't observe the inconsistency -- hopefully most queries fall into this category. Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? Right, the read-only transaction can't have incoming edges, but it can have outgoing edges. So it can't be the pivot itself (the transaction with both outgoing and incoming edges), but it can cause *another* transaction to be. In the example I gave, T3 (the r/o transaction) has an outgoing edge to T1, because it didn't see T1's concurrent update. T1 already had an outgoing edge to T2, so adding in this incoming edge from T3 creates the dangerous structure. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: Have you ever looked at one of those queries, that Londiste or Slony issue against the provider DB in order to get all the log data that has been committed between two snapshots? Is that really the best you can think of? No, I admit I haven't. In fact, I was thinking primarily in terms of log-driven situations, like HS. What would be the best place for me to look to come up to speed on your use case? (I'm relatively sure that the issue isn't that there's no information to find, but that a sequential pass over all available information would take a *long* time.) I've been working through the issues on WAL-based replicas, and have some additional ideas and alternatives, but I'd like to see the big picture, including trigger-based replication, before posting. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports d...@csail.mit.edu: On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. SSI would detect a potential conflict here, which we could resolve by aborting T3. (We could also abort T1, but if this is a replicated system this isn't always an option -- T3 might be running on the slave, so only the slave will know about the conflict, and it can't very well abort an update transaction on the master.) Ah, indeed. I made the same reasoning mistake as Florian (presumably) did: I didn't think of the fact that the read-only transaction doesn't need to be the pivot. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:48 , Dan Ports wrote: On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Yes. Note that even while it's in an inconsistent state, you can still perform any query that doesn't observe the inconsistency -- hopefully most queries fall into this category. Yeah, as long as you just walk by without looking, the database is happy ;-) Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? Right, the read-only transaction can't have incoming edges, but it can have outgoing edges. So it can't be the pivot itself (the transaction with both outgoing and incoming edges), but it can cause *another* transaction to be. In the example I gave, T3 (the r/o transaction) has an outgoing edge to T1, because it didn't see T1's concurrent update. T1 already had an outgoing edge to T2, so adding in this incoming edge from T3 creates the dangerous structure. Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug f...@phlo.org: On May 25, 2010, at 20:18 , Dan Ports wrote: T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? I would not call this an inconsistent state: it would become inconsistent only after someone (e.g., T3) has observed it _and_ T1 commits. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug f...@phlo.org: Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... In the general case, wr dependencies also create must be serialized before edges. It seems that those edges can be discarded when finding a pivot, but if you want to go back to basics: ( means must be serialized before.) * T1 T2, because T1 reads a version of a data element for which T2 later creates a newer version (rw between T1 and T2). * T3 T1, because T3 reads a version of a data element for which T1 later creates a newer version (rw between T3 and T1). * T2 T3, because T2 creates a version of a data element, which is then read by T3 (wr between T2 and T3). (As you can see, those 3 edges form a cycle.) Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug f...@phlo.org wrote: Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... Here's why: We're tracking rw-dependencies, where the time-arrow showing effective order of execution points from the reader to the writer (since the reader sees a state prior to the write, it effectively executes before it). These are important because there have to be two such dependencies, one in to the pivot and one out from the pivot, for a problem to exist. (See various works by Dr. Alan Fekete, et al, for details.) But other dependencies can imply an order of execution. In particular, a wr-dependency, where a transaction *can* see data committed by another transaction, implies that the *writer* came first in the order of execution. In this example, the transaction which lists the receipts successfully reads the control table update, but is not able to read the receipt insert. This completes the cycle, making it a real anomaly and not a false positive. Note that the wr-dependency can actually exist outside the database, making it pretty much impossible to accurately tell a false positive from a true anomaly when the pivot exists and the transaction writing data which the pivot can't read commits first. For example, let's say that the update to the control table is committed from an application which, seeing that its update came back without error, proceeds to list the receipts for the old date in a subsequent transaction. You have a wr-dependency which is, in reality, quite real and solid with no way to notice it within the database engine. That's why the techniques used in SSI are pretty hard to improve upon beyond more detailed and accurate tracking of rw-conflicts. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/24/2010 9:30 AM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? There is a difference between an empty result set and an exception. No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. Checking the age of the last entry in the file should be simple enough to determine if the segment matches the max age configuration (if set). In the case of a superuser telling what to purge he would just call a function with a serial number (telling the obsolete segments). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com writes: No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Robert Haas robertmh...@gmail.com wrote: maybe we should get serializable working and committed on one node first and then worry about how to distribute it. I think there might be other approaches to this problem Well, I've got two or three other ideas on how we can manage this for HS, but since I now realize that I've totally misunderstood the main use case for this (which is to support trigger-based replication), I'd like to be clear on something before letting it drop. The big question is, do such replicas need to support serializable access to the data modified by serializable transactions in the source database? That is, is there a need for such replicas to only see states which are possible in some serial order of execution of serializable transactions on the source database? Or to phrase the same question a third way, should there be a way to run queries on such replicas with confidence that what is viewed is consistent with user-defined constraints and business rules? If not, there's no intersection between this feature and SSI. If there is, I think we should think through at least a general strategy sooner, rather than later. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:50 PM, Simon Riggs wrote: On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:16 PM, Tom Lane wrote: Jan Wieck janwi...@yahoo.com writes: No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) I am well aware of the fact that commit timestamps within the WAL can go backwards and that the serial numbers of this proposed implementation of commit order can even be different from what the timestamps AND the WAL are saying. As long as the serial number (record position inside of segment) is determined while the transaction still holds all its locks, this is going to be good enough for what async replication users today are used to. Again, it will not magically make it possible to determine a serializable order of actions, that happened from transactions running in read committed isolation level, post mortem. I don't even even think that is possible at all. And I don't think anyone proposed a solution for that problem anyways. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? There is a difference between an empty result set and an exception. No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005240928 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkv6f0UACgkQvJuQZxSWSsh0xwCgmXLtKngoBBYX0TxDM2TlJRId AVIAoMHYa3c9Ej2vUJyFufxBR5vDPzQ+ =e1mh -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, May 23, 2010 at 9:44 PM, Jan Wieck janwi...@yahoo.com wrote: I'm not sure the retention policies of the shared buffer cache, the WAL buffers, CLOG buffers and every other thing we try to cache are that easy to fold into one single set of logic. But I'm all ears. I'm not sure either, although it seems like LRU ought to be good enough for most things. I'm more worried about things like whether the BufferDesc abstraction is going to get in the way. CommitTransaction() inside of xact.c will call a function, that inserts a new record into this array. The operation will for most of the time be nothing than taking a spinlock and adding the record to shared memory. All the data for the record is readily available, does not require further locking and can be collected locally before taking the spinlock. What happens when you need to switch pages? Then the code will have to grab another free buffer or evict one. Hopefully not while holding a spin lock. :-) The function will return the sequence number which CommitTransaction() in turn will record in the WAL commit record together with the begin_timestamp. While both, the begin as well as the commit timestamp are crucial to determine what data a particular transaction should have seen, the row count is not and will not be recorded in WAL. It would certainly be better if we didn't to bloat the commit xlog records to do this. Is there any way to avoid that? If you can tell me how a crash recovering system can figure out what the exact sequence number of the WAL commit record at hand should be, let's rip it. Hmm... could we get away with WAL-logging the next sequence number just once per checkpoint? When you replay the checkpoint record, you update the control file with the sequence number. Then all the commits up through the next checkpoint just use consecutive numbers starting at that value. It is an option. Keep it until I tell you is a perfectly valid configuration option. One you probably don't want to forget about, but valid none the less. As Tom is fond of saying, if it breaks, you get to keep both pieces. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This issue intersects with the serializable work I've been doing. While in database transactions using S2PL the above is true, in snapshot isolation and the SSI implementation of serializable transactions, it's not. In particular, the snapshot anomalies which can cause non-serializable behavior happen precisely because the apparent order of execution doesn't match anything so linear as order of commit. I'll raise that receipting example again. You have transactions which grab the current deposit data and insert it into receipts, as payments are received. At some point in the afternoon, the deposit date in a control table is changed to the next day, so that the receipts up to that point can be deposited during banking hours with the current date as their deposit date. A report is printed (and likely a transfer transaction recorded to move cash in drawer to cash in checking, but I'll ignore that aspect for this example). Some receipts may not be committed when the update to the date in the control table is committed. This is eventually consistent -- once all the receipts with the old date commit or roll back the database is OK, but until then you might be able to select the new date in the control table and the set of receipts matching the old date without the database telling you that you're missing data. The new serializable implementation fixes this, but there are open RD items (due to the need to discuss the issues) on the related Wiki page related to hot standby and other replication. Will we be able to support transactional integrity on slave machines? What if the update to the control table and the insert of receipts all happen on the master, but someone decides to move the (now happily working correctly with serializable transactions) reporting to a slave machine? (And by the way, don't get too hung up on this particular example, I could generate dozens more on demand -- the point is that order of commit doesn't always correspond to apparent order of execution; in this case the receipts *appear* to have executed first, because they are using a value later updated to something else by a different transaction, even though that other transaction *committed* first.) Replicating or recreating the whole predicate locking and conflict detection on slaves is not feasible for performance reasons. (I won't elaborate unless someone feels that's not intuitively obvious.) The only sane way I can see to have a slave database allow serializable behavior is to WAL-log the acquisition of a snapshot by a serializable transaction, and the rollback or commit, on the master, and to have the serializable snapshot build on a slave exclude any serializable transactions for which there are still concurrent serializable transactions. Yes, that does mean WAL- logging the snapshot acquisition even if the transaction doesn't yet have an xid, and WAL-logging the commit or rollback even if it never acquires an xid. I think this solve the issue Jan raises as long as serializable transactions are used; if they aren't there are no guarantees of transactional integrity no matter how you track commit sequence, unless it can be based on S2PL-type blocking locks. I'll have to leave that to someone else to sort out. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Mon, May 24, 2010 at 11:24 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This issue intersects with the serializable work I've been doing. While in database transactions using S2PL the above is true, in snapshot isolation and the SSI implementation of serializable transactions, it's not. I think you're confusing two subtly different things. The way to prove that a set of transactions running under some implementation of serializability is actually serializable is to construct a serial order of execution consistent with the view of the database that each transaction saw. This may or may not match the commit order, as you say. But the commit order is still the order the effects of those transactions have become visible - if we inserted a new read-only transaction into the stream at some arbitrary point in time, it would see all the transactions which committed before it and none of those that committed afterward. So I think Jan's statement is correct. Having said that, I think your concerns about how things will look from a slave's point of view are possibly valid. A transaction running on a slave is essentially a read-only transaction that the master doesn't know about. It's not clear to me whether adding such a transaction to the timeline could result in either (a) that transaction being rolled back or (b) some impact on which other transactions got rolled back. If it did, that would obviously be a problem for serializability on slaves, though your proposed fix sounds like it would be prohibitively expensive for many users. But can this actually happen? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Robert Haas wrote: I think you're confusing two subtly different things. The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. But the commit order is still the order the effects of those transactions have become visible - if we inserted a new read-only transaction into the stream at some arbitrary point in time, it would see all the transactions which committed before it and none of those that committed afterward. Isn't that what a snapshot does already? your proposed fix sounds like it would be prohibitively expensive for many users. But can this actually happen? How so? The transaction start/end logging, or looking at that data when building a snapshot? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 24/05/10 19:51, Kevin Grittner wrote: The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. Right, it doesn't. What it provides is a way to reconstruct a snapshot at any point in time, after the fact. For example, after transactions A, C, D and B have committed in that order, it allows you to reconstruct a snapshot just like you would've gotten immediately after the commit of A, C, D and B respectively. That's useful replication tools like Slony that needs to commit the changes of those transactions in the slave in the same order as they were committed in the master. I don't know enough of Slony et al. to understand why that'd be better than the current heartbeat mechanism they use, taking a snapshot every few seconds, batching commits. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Mon, May 24, 2010 at 12:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas wrote: I think you're confusing two subtly different things. The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. It's a tool for replication solutions to use. But the commit order is still the order the effects of those transactions have become visible - if we inserted a new read-only transaction into the stream at some arbitrary point in time, it would see all the transactions which committed before it and none of those that committed afterward. Isn't that what a snapshot does already? Yes, for a particular transaction. But this is to allow transactions to be replayed (in order) on another node. your proposed fix sounds like it would be prohibitively expensive for many users. But can this actually happen? How so? The transaction start/end logging, or looking at that data when building a snapshot? I guess what I'm asking is - if the reconstructed transaction order inferred by SSI doesn't match the actual commit order, can we get a serialization anomaly on the standby by replaying transactions there in commit order? Can you give an example and explain how your proposal would solve it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Excerpts from Robert Haas's message of dom may 23 20:38:14 -0400 2010: On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote: The system will have postgresql.conf options for enabling/disabling the whole shebang, how many shared buffers to allocate for managing access to the data and to define the retention period of the data based on data volume and/or age of the commit records. It would be nice if this could just be managed out of shared_buffers rather than needing to configure a separate pool just for this feature. FWIW we've talked about this for years -- see old discussions about how pg_subtrans becomes a bottleneck in certain cases and you want to enlarge the number of buffers allocated to it (probably easy to find by searching posts from Jignesh). I'm guessing the new notify code would benefit from this as well. It'd be nice to have as a side effect, but if not, IMHO this proposal could simply use a fixed buffer pool like all other slru.c callers until someone gets around to fixing that. Adding more GUC switches for this strikes me as overkill. -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Mon, May 24, 2010 at 3:07 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote: It'd be nice to have as a side effect, but if not, IMHO this proposal could simply use a fixed buffer pool like all other slru.c callers until someone gets around to fixing that. Adding more GUC switches for this strikes me as overkill. I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, May 23, 2010 at 04:21:58PM -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. This is very interesting to me as I've been doing some (research -- nowhere near production-level) work on building a transactional application-level (i.e. memcached-like) cache atop Postgres. One of the features I needed to support it was basically what you describe. Without getting too far into the details of what I'm doing, I needed to make it clear to a higher layer which commits were visible to a given query. That is, I wanted to know both the order of commits and where particular snapshots fit into this ordering. (A SnapshotData struct obviously contains the visibility information, but a representation in terms of the commit ordering is both more succinct and allows for easy ordering comparisons). Something you might want to consider, then, is adding an interface to find out the timestamp of the current transaction's snapshot, i.e. the timestamp of the most recent committed transaction visible to it. I wouldn't expect this to be difficult to implement as transaction completion/visibility is already synchronized via ProcArrayLock. Each record of the Transaction Commit Info consists of txid xci_transaction_id timestamptz xci_begin_timestamp timestamptz xci_commit_timestamp int64 xci_total_rowcount Another piece of information that seems useful to provide here would be the logical timestamp of the transaction, i.e. a counter that's incremented by one for each transaction. But maybe that's implicit in the log ordering? I'm not clear on why the total rowcount is useful, but perhaps I'm missing something obvious. I've actually implemented some semblance of this on Postgres 8.2, but it sounds like what you're interested in is more sophisticated. In particular, I wasn't at all concerned with durability or WAL stuff, and I had some specific requirements about when it was OK to purge the data. Because of this (and very limited development time), I just threw something together with a simple shared buffer. I don't think I have any useful code to offer, but let me know if there's some way I can help out. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Robert Haas wrote: It's a tool for replication solutions to use. I was thrown by the original post referencing data warehousing. For replication I definitely see that it would be good to provide some facility to grab a coherent snapshot out of the transaction stream, but I'm still not clear on a use case where other solutions aren't better. If you want a *particular* past snapshot, something akin to the transactional caching that Dan Ports mentioned seems best. If you just want a coherent snapshot like snapshot isolation, the current mechanisms seem to work (unless I'm missing something?). If you want solid data integrity querying the most recent replicated data, the proposal I posted earlier in the thread is the best I can see, so far. if the reconstructed transaction order inferred by SSI doesn't match the actual commit order, can we get a serialization anomaly on the standby by replaying transactions there in commit order? Yes. If we don't do *something* to address it, the replicas (slaves) will operate as read-only snapshot isolation, not true serializable. Can you give an example and explain how your proposal would solve it? I gave an example (without rigorous proof accompanying it, granted) earlier in the thread. In that example, if you allow a selection against a snapshot which includes the earlier commit (the update of the control table) and before the later commits (the receipts which used the old deposit date) you have exactly the kind of serialization anomaly which the work in progress prevents on the source (master) database -- the receipts *appear* to run in earlier transactions because the see the pre-update deposit date, but they show up out of order. As far as I'm concerned this is only a problem if the user *requested* serializable behavior for all transactions involved. If we send the information I suggested in the WAL stream, then any slave using the WAL stream could build a snapshot for a serializable transaction which excluded serializable transactions from the source which overlap with still-pending serializable transactions on the source. In this example, the update of the control table would not be visible to a serializable transaction on the slave until any overlapping serializable transactions (which would include any receipts using the old date) had also committed, so you could never see the writes out of order. I don't think that passing detailed predicate locking information would be feasible from a performance perspective, but since the slaves are read-only, I think it is fine to pass just the minimal transaction-level information I described. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Mon, May 24, 2010 at 4:03 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas wrote: It's a tool for replication solutions to use. I was thrown by the original post referencing data warehousing. For replication I definitely see that it would be good to provide some facility to grab a coherent snapshot out of the transaction stream, but I'm still not clear on a use case where other solutions aren't better. If you want a *particular* past snapshot, something akin to the transactional caching that Dan Ports mentioned seems best. If you just want a coherent snapshot like snapshot isolation, the current mechanisms seem to work (unless I'm missing something?). If you want solid data integrity querying the most recent replicated data, the proposal I posted earlier in the thread is the best I can see, so far. Well, AIUI, what you're really trying to do is derive the delta between an old snapshot and a newer snapshot. Can you give an example and explain how your proposal would solve it? I gave an example (without rigorous proof accompanying it, granted) earlier in the thread. In that example, if you allow a selection against a snapshot which includes the earlier commit (the update of the control table) and before the later commits (the receipts which used the old deposit date) you have exactly the kind of serialization anomaly which the work in progress prevents on the source (master) database -- the receipts *appear* to run in earlier transactions because the see the pre-update deposit date, but they show up out of order. Yep, I see it now. As far as I'm concerned this is only a problem if the user *requested* serializable behavior for all transactions involved. Agreed. If we send the information I suggested in the WAL stream, then any slave using the WAL stream could build a snapshot for a serializable transaction which excluded serializable transactions from the source which overlap with still-pending serializable transactions on the source. In this example, the update of the control table would not be visible to a serializable transaction on the slave until any overlapping serializable transactions (which would include any receipts using the old date) had also committed, so you could never see the writes out of order. I don't think that passing detailed predicate locking information would be feasible from a performance perspective, but since the slaves are read-only, I think it is fine to pass just the minimal transaction-level information I described. I suspect that's still going to be sort of hard on performance, but maybe we should get serializable working and committed on one node first and then worry about how to distribute it. I think there might be other approaches to this problem (global transaction coordinator? standby requests snapshot from primary?). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/24/2010 12:51 PM, Kevin Grittner wrote: Robert Haas wrote: I think you're confusing two subtly different things. The only thing I'm confused about is what benefit anyone expects to get from looking at data between commits in some way other than our current snapshot mechanism. Can someone explain a use case where what Jan is proposing is better than snapshot isolation? It doesn't provide any additional integrity guarantees that I can see. But the commit order is still the order the effects of those transactions have become visible - if we inserted a new read-only transaction into the stream at some arbitrary point in time, it would see all the transactions which committed before it and none of those that committed afterward. Isn't that what a snapshot does already? It does and the proposed is a mere alternative serving the same purpose. Have you ever looked at one of those queries, that Londiste or Slony issue against the provider DB in order to get all the log data that has been committed between two snapshots? Is that really the best you can think of? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/24/2010 3:10 PM, Dan Ports wrote: I'm not clear on why the total rowcount is useful, but perhaps I'm missing something obvious. It is a glimpse into the future. Several years of pain doing replication work has taught me that knowing approximately who much work the next chunk will be before you select it all is a really useful thing. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote: Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This issue intersects with the serializable work I've been doing. While in database transactions using S2PL the above is true, in snapshot isolation and the SSI implementation of serializable transactions, it's not. In particular, the snapshot anomalies which can cause non-serializable behavior happen precisely because the apparent order of execution doesn't match anything so linear as order of commit. All true, but this doesn't pose a problem in snapshot isolation. Maybe this is obvious to everyone else, but just to be clear: a transaction's snapshot is determined entirely by which transactions committed before it snapshotted (and hence are visible to it). Thus, replaying update transactions in the sae order on a slave makes the same sequence of states visible to it. Of course (as in your example) some of these states could expose snapshot isolation anomalies. But that's true on a single-replica system too. Now, stepping into the SSI world... Replicating or recreating the whole predicate locking and conflict detection on slaves is not feasible for performance reasons. (I won't elaborate unless someone feels that's not intuitively obvious.) The only sane way I can see to have a slave database allow serializable behavior is to WAL-log the acquisition of a snapshot by a serializable transaction, and the rollback or commit, on the master, and to have the serializable snapshot build on a slave exclude any serializable transactions for which there are still concurrent serializable transactions. Yes, that does mean WAL- logging the snapshot acquisition even if the transaction doesn't yet have an xid, and WAL-logging the commit or rollback even if it never acquires an xid. One important observation is that any anomaly that occurs on the slave can be resolved by aborting a local read-only transaction. This is a good thing, because the alternatives are too horrible to consider. You could possibly cut the costs of predicate locking by having the master ship with each transaction the list of predicate locks it acquired. But you'd still have to track locks for read-only transactions, so maybe that's not a significant cost improvement. On the other hand, if you're willing to pay the price of serializability on the master, why not the slaves too? Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 0:42 , Dan Ports wrote: On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote: Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This issue intersects with the serializable work I've been doing. While in database transactions using S2PL the above is true, in snapshot isolation and the SSI implementation of serializable transactions, it's not. In particular, the snapshot anomalies which can cause non-serializable behavior happen precisely because the apparent order of execution doesn't match anything so linear as order of commit. All true, but this doesn't pose a problem in snapshot isolation. Maybe this is obvious to everyone else, but just to be clear: a transaction's snapshot is determined entirely by which transactions committed before it snapshotted (and hence are visible to it). Thus, replaying update transactions in the sae order on a slave makes the same sequence of states visible to it. The subtle point here is whether you consider the view from the outside (in the sense of what a read-only transaction started at an arbitrary time can or cannot observe), or from the inside (what updating transactions can observe and might base their updates on). The former case is completely determined by the commit ordering of the transactions, while the latter is not - otherwise serializability wouldn't be such a hard problem. For some problems, like replication, the former (outside) view is what matters - if slave synthesizes transactions that insert/update/delete the very same tuples as the original transaction did, and commits them in the same order, no read-only transaction can observe the difference. But that is *not* a serial schedule of the original transactions, since the transactions are *not* the same - the merely touch the same tuples. In fact, if you try replaying the original SQL, you *will* get different results on the slave, and not only because of now() and the like. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug f...@phlo.org writes: The subtle point here is whether you consider the view from the outside (in the sense of what a read-only transaction started at an arbitrary time can or cannot observe), or from the inside (what updating transactions can observe and might base their updates on). The former case is completely determined by the commit ordering of the transactions, while the latter is not - otherwise serializability wouldn't be such a hard problem. BTW, doesn't all this logic fall in a heap as soon as you consider read-committed transactions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Exposing the data will be done via a set returning function. The SRF takes two arguments. The maximum number of rows to return and the last serial number processed by the reader. The advantage of such SRF is that the result can be used in a query that right away delivers audit or replication log information in transaction commit order. The SRF can return an empty set if no further transactions have committed since, or an error if data segments needed to answer the request have already been purged. In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005231646 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkv5lIAACgkQvJuQZxSWSsiR3gCgvyK/NPd6WmKGUqdo/3fdWIR7 LAQAoJqk3gYpEgtjw10gINDKFXTAnWO5 =sSvK -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote: The system will have postgresql.conf options for enabling/disabling the whole shebang, how many shared buffers to allocate for managing access to the data and to define the retention period of the data based on data volume and/or age of the commit records. It would be nice if this could just be managed out of shared_buffers rather than needing to configure a separate pool just for this feature. But, I'm not sure how much work that is, and if it turns out to be too ugly then I'd say it's not a hard requirement. In general, I think we talked during the meeting about the desirability of folding specific pools into shared_buffers rather than managing them separately, but I'm not aware that we have any cases where we do that today so it might be hard (or not). Each record of the Transaction Commit Info consists of txid xci_transaction_id timestamptz xci_begin_timestamp timestamptz xci_commit_timestamp int64 xci_total_rowcount 32 bytes total. Are we sure it's worth including the row count? I wonder if we ought to leave that out and let individual clients of the mechanism track that if they're so inclined, especially since it won't be reliable anyway. CommitTransaction() inside of xact.c will call a function, that inserts a new record into this array. The operation will for most of the time be nothing than taking a spinlock and adding the record to shared memory. All the data for the record is readily available, does not require further locking and can be collected locally before taking the spinlock. What happens when you need to switch pages? The function will return the sequence number which CommitTransaction() in turn will record in the WAL commit record together with the begin_timestamp. While both, the begin as well as the commit timestamp are crucial to determine what data a particular transaction should have seen, the row count is not and will not be recorded in WAL. It would certainly be better if we didn't to bloat the commit xlog records to do this. Is there any way to avoid that? Checkpoint handling will call a function to flush the shared buffers. Together with this, the information from WAL records will be sufficient to recover this data (except for row counts) during crash recovery. Right. Exposing the data will be done via a set returning function. The SRF takes two arguments. The maximum number of rows to return and the last serial number processed by the reader. The advantage of such SRF is that the result can be used in a query that right away delivers audit or replication log information in transaction commit order. The SRF can return an empty set if no further transactions have committed since, or an error if data segments needed to answer the request have already been purged. Purging of the data will be possible in several different ways. Autovacuum will call a function that drops segments of the data that are outside the postgresql.conf configuration with respect to maximum age or data volume. There will also be a function reserved for superusers to explicitly purge the data up to a certain serial number. Dunno if autovacuuming this is the right way to go. Seems like that could leave to replication breaks, and it's also more work than not doing that. I'd just say that if you turn this on you're responsible for pruning it, full stop. Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/23/2010 4:48 PM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Exposing the data will be done via a set returning function. The SRF takes two arguments. The maximum number of rows to return and the last serial number processed by the reader. The advantage of such SRF is that the result can be used in a query that right away delivers audit or replication log information in transaction commit order. The SRF can return an empty set if no further transactions have committed since, or an error if data segments needed to answer the request have already been purged. In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? There is a difference between an empty result set and an exception. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers