Re: [HACKERS] SSI and Hot Standby
On Fri, 2011-01-21 at 02:32 -0500, Dan Ports wrote: On Fri, Jan 21, 2011 at 08:44:59AM +0200, Heikki Linnakangas wrote: We have enough information in the standby to reconstruct all writes done in the master. I gather that's not enough, in order to roll back read-only transaction T3 on the standby which would see an anomaly, we'd also need to know what reads T1 and T2 did in the master. Is that correct? That's some of the information we need, but it's not enough... The problem is that the conflict might not be discovered until after T3 (the reader) commits. In that case, it's too late to abort T3, so you'd need to roll back T2 instead. But that means a read-only transaction on the slave has to be able to cause a concurrent read-write transaction on the master to abort, which brings with it no end of problems. So T1 and T2 are already potentially unserialized and the presence of T3 causes the sequence to be caught out from which we must then abort T2. The witness does not create the offence, they just report the crime. So any xid that commits in a different sequence to the order in which the xid was assigned creates a potential for unserialization? Or? On HS we know the order of arrival of xids, and we know the order of commits, so we should be able to work out which are the potentially unserializable snapshots. That would allow us to make the standby independent of the master, thereby avoiding all this messy information flow. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 21.01.2011 11:10, Simon Riggs wrote: So any xid that commits in a different sequence to the order in which the xid was assigned creates a potential for unserialization? Or? It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. On HS we know the order of arrival of xids, and we know the order of commits, so we should be able to work out which are the potentially unserializable snapshots. That would allow us to make the standby independent of the master, thereby avoiding all this messy information flow. Unfortunately although we know the order of arrival of xids, it doesn't tell us the order the transactions started. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, 2011-01-21 at 11:19 +0200, Heikki Linnakangas wrote: On 21.01.2011 11:10, Simon Riggs wrote: So any xid that commits in a different sequence to the order in which the xid was assigned creates a potential for unserialization? Or? It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. So if a read-write transaction assigns an xid before it takes a snapshot then we'll be OK? That seems much easier to arrange than passing chunks of snapshot data backwards and forwards. Optionally. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 01/21/2011 03:25 AM, Florian Pflug wrote: The COMMIT order in the actual, concurrent, schedule doesn't not necessarily represent the order of the transaction in an equivalent serial schedule. Here's an example T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T1: UPDATE D1 ... ; T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T2: SELECT * FROM D1 ... ; T2: UPDATE D2 ... ; T1: COMMIT; T3: SELECT * FROM D1, D2; T2: COMMIT; Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent serial schedule. In any such schedule T2 must run before T1 because T2 didn't see T1's changes to D1 T3 must run after T1 because T3 did see T1's changes to D1 T3 must run before T2 because T3 didn't see T2's changes to D2 This is obviously impossible - if T3 runs before T2 and T2 runs before T1 then T3 runs before T1, contradicting the second requirement. There is thus no equivalent serial schedule and we must abort of these transactions with a serialization error. Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone, an equivalent serial schedule is T2,T1! Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the snapshot of T3 would have been valid, right? Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2 in any case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic in your post. What am I missing? I am feeling stupid... At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from the dump, T2 never happened in that dump. In the reloaded database it just did not exist at all. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Jan21, 2011, at 12:55 , Anssi Kääriäinen wrote: On 01/21/2011 03:25 AM, Florian Pflug wrote: The COMMIT order in the actual, concurrent, schedule doesn't not necessarily represent the order of the transaction in an equivalent serial schedule. Here's an example T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T1: UPDATE D1 ... ; T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T2: SELECT * FROM D1 ... ; T2: UPDATE D2 ... ; T1: COMMIT; T3: SELECT * FROM D1, D2; T2: COMMIT; Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent serial schedule. In any such schedule T2 must run before T1 because T2 didn't see T1's changes to D1 T3 must run after T1 because T3 did see T1's changes to D1 T3 must run before T2 because T3 didn't see T2's changes to D2 This is obviously impossible - if T3 runs before T2 and T2 runs before T1 then T3 runs before T1, contradicting the second requirement. There is thus no equivalent serial schedule and we must abort of these transactions with a serialization error. Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone, an equivalent serial schedule is T2,T1! Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the snapshot of T3 would have been valid, right? Yeah. If T2 is removed from the picture, the only remaining ordering constraint is T3 must run after T1 because T3 did see T1's changes to D1, and thus T1,T3 is an equivalent serial schedule. Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2 in any case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic in your post. What am I missing? I am feeling stupid... The problem is that T3 sees the effects of T1 but not those of T2. Since T2 must run *before* T1 in any equivalent serial schedule, that is impossible. In other words, if you look at an equivalent serial schedule of the *writing* transactions T1 and T2 you won't find a place to insert T3 such that it gives the same answer as in the concurrent schedule. It isn't really T3's snapshot that is invalid, it's the interleaving of T1,T2,T3 because there is no equivalent serial schedule (a serial schedule the produces the same results). If, for example T3 reads only *one* of the tables D1,D2 then the whole thing suddenly *is* serializable! If T3 reads only D1 an equivalent serial schedule must run T3 after T1, and if it reads only D2 then it must run before T2. That validity of snapshots comes into play if you attempt to distinguish safe and unsafe interleaved schedules *without* taking the dataset inspected by T3 into account. So you simply assume that T3 reads the *whole* database (since thats the worst case), and must thus run *after* all transactions I didn't see as COMMITTED in any serial schedule. The root of the whole issue is that this might not be possible! Some not-yet-committed transaction (T2 in the example) may have to be placed *before* some transaction seen as COMMITTED by T3 (T1 is our example). Since T3 needs to run *after* T1 (since it saw it as committed) it'd thus also see T2 in any serial schedule. But it didn't see T2 in the interleaved schedule, we're hosed. At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from the dump, T2 never happened in that dump. In the reloaded database it just did not exist at all. Still, the would dump reflects a database state that *logically* never existed (i.e. not in any serial schedule). If you dump for disaster recovery, you might not care. If you dump to copy the data onto some reporting server you might. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Jan21, 2011, at 10:19 , Heikki Linnakangas wrote: On 21.01.2011 11:10, Simon Riggs wrote: So any xid that commits in a different sequence to the order in which the xid was assigned creates a potential for unserialization? Or? It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. Any pair of concurrent transactions on the master between which r/w-dependencies exist are a potential risk. If their order in all equivalent serial schedule doesn't match their commit order, any snapshot taken between the two commits don't represent a fully consistent view of the database. Hm, wait a minute... Thinks... Since transactions on the slave don't write, they really don't need to be aware of any SIREAD locks taken on the master, right? We'd still need to detect conflicts between SIREAD locks taken on the slaves and writes by transaction on the master, but that could be handled by the recovery process without having to report anything back to the master, and without logging SIREAD lock acquisitions. So, how about the following A) We log r/w-dependencies between transactions committed on the master in the WAL, probably in the COMMIT record B) SERIALIZABLE queries on the slave use the SIREAD lock machinery like they'd do on the master. The recovery process does the necessary conflict flagging in case the write happens (in wall clock time) after the slave, mimicking what the writing transaction had done on the master had it seen the SIREAD lock C) By using the r/w-dependency information from the WAL plus the r/w-dependency information generated on the slave we can detect dangerous situations on the slave, and abort the offending query on the slave. (A) and (B) seem quite straight-forward. Regarding (C), I'm not familiar enough with the inner workings of the SSI patch to judge that. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
2011/1/21 Anssi Kääriäinen anssi.kaariai...@thl.fi: Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the snapshot of T3 would have been valid, right? Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2 in any case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic in your post. What am I missing? I am feeling stupid... At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from the dump, T2 never happened in that dump. In the reloaded database it just did not exist at all. This has been discussed before; in [1] I summarized: IOW, one could say that the backup is consistent only if it were never compared against the system as it continued running after the dump took place. Nicolas [1] URL:http://archives.postgresql.org/pgsql-hackers/2010-09/msg01763.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Heikki Linnakangas wrote: It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. From the Apparent Serial Order of Execution sub-section of the Overview section of the Wiki page: http://wiki.postgresql.org/wiki/Serializable#Apparent_Serial_Order_of_Execution | The order in which transactions appear to have executed is | determined by something more subtle than in S2PL: read/write | dependencies. If a transaction attempts to read data which is not | visible to it because the transaction which wrote it (or will later | write it) is concurrent (one of them was running when the other | acquired its snapshot), then the reading transaction appears to | have executed first, regardless of the actual sequence of | transaction starts or commits (since it sees a database state prior | to that in which the other transaction leaves it). It is the fact that it generates results consistent with some serial order of execution which makes this truly serializable. With SSI that order isn't tied to transaction start or transaction commit when transactions overlap -- the one which *appears* to have run first might have started first *and* committed first. And as Dan pointed out with a repeatable example, a read only transaction can contribute to a cycle in teh apparent order of execution, even if it commits ahead of one of the writers. There really are only two ways a read only transaction can be guaranteed to see consistent data under SSI: (1) A read write transaction might need to be cancelled to prevent the view of the data a committed read only transaction has already seen from becoming inconsistent. (Dan's example) (2) The read only transaction needs to get a snapshot which the master has determined is safe. There really aren't any other alternatives under SSI. The question is which we support on hot standbys and how we achieve it. Or whether we just dont, I guess; but that doesn't seem very satisfactory as a long term solution. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, Jan 21, 2011 at 8:05 AM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2011/1/21 Anssi Kääriäinen anssi.kaariai...@thl.fi: Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the snapshot of T3 would have been valid, right? Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2 in any case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic in your post. What am I missing? I am feeling stupid... At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from the dump, T2 never happened in that dump. In the reloaded database it just did not exist at all. This has been discussed before; in [1] I summarized: IOW, one could say that the backup is consistent only if it were never compared against the system as it continued running after the dump took place. But that's a pretty fair way to look at it, isn't it? I mean, I guess it's a question of what you plan to use that backup for, but if it's disaster recovery, everything that happened after the dump is gone, so no such comparison will occur. And that's probably the most common reason for taking a dump. It occurs to me that focusing on how this is going to work on Hot Standby might be looking at the question too narrowly. The general issue is - does this technique generalize to a distributed computing environment, with distributed transactions across multiple PostgreSQL databases? For example, what if the control record in Kevin's example is stored in another database, or on another server. Or what if some tables are replicated via Slony? I realize this is all outside the scope of the patch, but that's exactly the point: making this stuff work across multiple databases (even if they are replicas of each other) is much more complex than getting it to work on just one machine. Even if we could agree on how to do it, coming up with some hack that can only ever possibly work in the Hot Standby case might not be the best thing to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Kevin Grittner kevin.gritt...@wicourts.gov wrote: (1) A read write transaction might need to be canceled to prevent the view of the data a committed read only transaction has already seen from becoming inconsistent. (Dan's example) And this one seems entirely a theoretical possibility. I spent a little time looking it over, and I don't see how it could be made to work from hot standbys without an unbounded flow of predicate lock information from all standbys to the master *plus* blocking commits on the master for the duration of the longest round trip latency to any standby. I think we can call this one dead on arrival. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 01/21/2011 02:21 PM, Florian Pflug wrote: Still, the would dump reflects a database state that *logically* never existed (i.e. not in any serial schedule). If you dump for disaster recovery, you might not care. If you dump to copy the data onto some reporting server you might. best regards, Florian Pflug I am beginning to understand the problem. If you don't mind, here is a complete example if somebody else is having troubles understanding this. Let's say we have tables D1 and D2. Both contain a single column, id, and a single row. The data in the beginning is as follows: D1: id = 1 D2: id = 1 The constrains: D1.id can only be incremented. Whenever D2.id is updated, it must be updated to D1.id + 1. The transactions: T1: begin; update D1 set id = id + 1; T2: begin; update D2 set id = (select id+1 from D1); T1: commit; T3: begin; select id from D1; select id from D2; commit; Data seen: (2, 1) -- this is a possible state T2: commit; T4: begin; select id from D1; select id from D2; commit; Data seen: (2, 2) This is again a possible state. But if we compare this to the state seen by T3 this is not valid. From state (2, 1) we can not get to state (2, 2) without breaking one of the constraints. Thus, the state of T3 is not valid in the database. So, I finally got it! :-) I hope this example will help somebody else understand the problem. The problem I had understanding this was that the state in T3 is in fact perfectly valid. I though that there must be some problem with that state alone. There isn't, unless you compare it to the state after T2 has committed. Thanks to all explaining this to me, - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 21, 2011 at 8:05 AM, Nicolas Barbier This has been discussed before; in [1] I summarized: IOW, one could say that the backup is consistent only if it were never compared against the system as it continued running after the dump took place. But that's a pretty fair way to look at it, isn't it? I mean, I guess it's a question of what you plan to use that backup for, but if it's disaster recovery, everything that happened after the dump is gone, so no such comparison will occur. And that's probably the most common reason for taking a dump. It's not, however, a reason for having a hot standby (versus a warm standby or PITR backup). It occurs to me that focusing on how this is going to work on Hot Standby might be looking at the question too narrowly. The general issue is - does this technique generalize to a distributed computing environment, with distributed transactions across multiple PostgreSQL databases? No, and I can pretty much guarantee that you can't have such a solution without blocking on all masters at commit time. What you're suggesting goes *way* beyond two phase commit, which just guarantees the integrity rules of each database are honored and that all transactions either commit or don't. You're talking about sharing lock information across high-latency communication links which in SSI are communicated via LW locking. Expect any such generalized and world peace! solution to be rather slow. For example, what if the control record in Kevin's example is stored in another database, or on another server. Or what if some tables are replicated via Slony? I realize this is all outside the scope of the patch Yep. Again, the patch achieves true serializability with minimal cost and *no blocking*. Spend a few minutes thinking about how you might coordinate what you propose, and you'll see it's going to involve blocking based on waiting for messages from across the wire. but that's exactly the point: making this stuff work across multiple databases (even if they are replicas of each other) is much more complex than getting it to work on just one machine. Even if we could agree on how to do it, coming up with some hack that can only ever possibly work in the Hot Standby case might not be the best thing to do. I don't see it as a hack. It's the logical extension of SSI onto read only replicas. If you're looking for something more than that (as the above suggests), it's not a good fit; but I suspect that there are people besides me who would want to use hot standby for reporting and read only web access who would want a serializable view. What this proposal does is to say that there are two time streams to look at on the standby -- how far along you are for purposes of recovery, and how far along you are for purposes of seeing a view of the data sure to be consistent the later state of the master. With SSI they can't be the same. If someone wants them to be, they could implement a traditional S2PL serializable mode, complete with blocking and deadlocks, and then you'd have it automatically on the replicas, because with S2PL the apparent order of execution matches the commit order. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
2011/1/21 Robert Haas robertmh...@gmail.com: On Fri, Jan 21, 2011 at 8:05 AM, Nicolas Barbier nicolas.barb...@gmail.com wrote: 2011/1/21 Anssi Kääriäinen anssi.kaariai...@thl.fi: Sorry for bothering all of you, but I just don't get this. What if T2 rolls back instead of committing? Then the snapshot of T3 would have been valid, right? Now, for the snapshot of T3 it doesn't matter if T2 commits or if it doesn't, because it can't see the changes of T2 in any case. Thus, it would seem that the snapshot is valid. On the other hand I can't see anything wrong in the logic in your post. What am I missing? I am feeling stupid... At least for dumps I don't see how T2 can matter (assuming T3 is the pg_dump's snapshot). Because if you reload from the dump, T2 never happened in that dump. In the reloaded database it just did not exist at all. This has been discussed before; in [1] I summarized: IOW, one could say that the backup is consistent only if it were never compared against the system as it continued running after the dump took place. But that's a pretty fair way to look at it, isn't it? Indeed, I just wanted to point Anssi to the previous discussion. It occurs to me that focusing on how this is going to work on Hot Standby might be looking at the question too narrowly. The general issue is - does this technique generalize to a distributed computing environment, with distributed transactions across multiple PostgreSQL databases? For example, what if the control record in Kevin's example is stored in another database, or on another server. Or what if some tables are replicated via Slony? I realize this is all outside the scope of the patch, but that's exactly the point: making this stuff work across multiple databases (even if they are replicas of each other) is much more complex than getting it to work on just one machine. Even if we could agree on how to do it, coming up with some hack that can only ever possibly work in the Hot Standby case might not be the best thing to do. You seem to be questioning whether the normal (?) way of using 2PC on multiple DBs (just start transactions, and let 2PC coordinate the commits) that all use SERIALIZABLE isolation mode always results in global serializable behavior. I must say that I don't immediately see the answer (my gut feeling says nope), but it sure is an interesting question. In the special case where all databases use SS2PL as their concurrency mechanism, the answer is yes. (Because any possibly conflicting local transactions of any global transactions that touch the same tables (conflict) will necessarily commit in some globally consistent order, which in the case of SS2PL is consistent with the must-have-happened-before-in-any-serialized-order relation, i.e., using the terminology from the literature: because SS2PL is CO.) I don't know whether it is generally considered a requirement for a 2PC environment to guarantee global serializability. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com writes: On Fri, 2011-01-21 at 11:19 +0200, Heikki Linnakangas wrote: It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. So if a read-write transaction assigns an xid before it takes a snapshot then we'll be OK? That seems much easier to arrange than passing chunks of snapshot data backwards and forwards. Optionally. No, that idea is DOA from a performance standpoint. We sweated blood to avoid having to assign XIDs to read-only transactions, and we're not going back. If SSI requires that, SSI is not getting committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, Jan 21, 2011 at 10:32 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2011-01-21 at 11:19 +0200, Heikki Linnakangas wrote: It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. So if a read-write transaction assigns an xid before it takes a snapshot then we'll be OK? That seems much easier to arrange than passing chunks of snapshot data backwards and forwards. Optionally. No, that idea is DOA from a performance standpoint. We sweated blood to avoid having to assign XIDs to read-only transactions, and we're not going back. If SSI requires that, SSI is not getting committed. So far I think all of the ideas proposed for generalizing this across the master-standby connection seem likely to be DOA from a performance perspective. But I think we have a pretty broad consensus that it's OK to punt this issue for 9.1. We can always add this in 9.2 if it can be demonstrated to work well, but it's all vapor-ware right now anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2011-01-21 at 11:19 +0200, Heikki Linnakangas wrote: It's not the order in which the xid was assigned that matters, but the order the transactions started and got their snapshots. The xids might be assigned a lot later, after the transactions have already read data. So if a read-write transaction assigns an xid before it takes a snapshot then we'll be OK? That seems much easier to arrange than passing chunks of snapshot data backwards and forwards. We're not talking about passing the backwards. I'm suggesting that we probably don't even need to pass them forward, but that suggestion has been pretty handwavy so far. I guess I should fill it out, because everyone's been ignoring it so far. No, that idea is DOA from a performance standpoint. We sweated blood to avoid having to assign XIDs to read-only transactions, and we're not going back. If SSI requires that, SSI is not getting committed. SSI doesn't require that. The suggestion that it would in *any* way help with the interaction with hot standby is off-base. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Anssi Kääriäinenanssi.kaariai...@thl.fi wrote: I am beginning to understand the problem. If you don't mind, here is a complete example if somebody else is having troubles understanding this. Let's say we have tables D1 and D2. Both contain a single column, id, and a single row. The data in the beginning is as follows: D1: id = 1 D2: id = 1 The constrains: D1.id can only be incremented. Whenever D2.id is updated, it must be updated to D1.id + 1. The transactions: T1: begin; update D1 set id = id + 1; T2: begin; update D2 set id = (select id+1 from D1); T1: commit; T3: begin; select id from D1; select id from D2; commit; Data seen: (2, 1) -- this is a possible state T2: commit; T4: begin; select id from D1; select id from D2; commit; Data seen: (2, 2) This is again a possible state. But if we compare this to the state seen by T3 this is not valid. From state (2, 1) we can not get to state (2, 2) without breaking one of the constraints. Thus, the state of T3 is not valid in the database. So, I finally got it! :-) I hope this example will help somebody else understand the problem. Yeah, interesting example. Under SSI, once T3 selects from D2 you have a dangerous structure, and either T2 or T3 must fail to prevent the possibility of the sort of anomaly your example demonstrates. We would prefer to see T2 fail, because if T3 fails it will continue to fail on retry until T2 completes. We're trying to avoid that kind of thrashing. If T2 fails and is retried, it will immediately succeed and generate results consistent with what T3 saw. When I test your example, though, I'm getting the serialization failure on T3 rather than T2, so I'd call that a bug. Will investigate. Thanks again for your tests! You seem to be able to shake out issues better than anyone else! Once found, fixing them is not usually very hard, it's coming up with that creative usage pattern to *find* the problem which is the hard part. OK if I add this one to our dcheck test suite, too? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Kevin Grittner kevin.gritt...@wicourts.gov wrote: When I test your example, though, I'm getting the serialization failure on T3 rather than T2, so I'd call that a bug. Will investigate. Thanks again for your tests! Fixed with this: http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=b91460812396b68362c812d6e4fb67799fc6147e Thanks again! -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
I wrote: We're not talking about passing the backwards. I'm suggesting that we probably don't even need to pass them forward, but that suggestion has been pretty handwavy so far. I guess I should fill it out, because everyone's been ignoring it so far. It's been too hectic today to flesh this out very well, but I can at least do a better brain dump -- you know, wave my hands a little less vaguely. The idea of communicating regarding a safe snapshot through the WAL without actually *sending* snapshot XIDs through the WAL might work something like this: (1) We communicate when we are starting to consider a snapshot. This would always be related to the commit or rollback of a serializable read-write transaction, so perhaps we could include the information in an existing WAL record. We would need to find one free bit somewhere, or make room for it. Alternatively, we could send a new WAL record type to communicate this. At the point that a standby processes such a WAL record, it would grab a snapshot effect after the commit, and save it as the latest candidate, releasing the previous candidate, if any. (2) If a snapshot fails to make it to a safe status on the master, it will pick a new candidate and repeat (1) -- there's no need to explicitly quash a failed candidate. (3) We communicate when we find that the last candidate made it to safe status. Again, this would be related to the commit or rollback of a serializable read-write transaction. Same issues about needing (another) bit or using a new record type. When a standby receives this, it promotes the latest candidate to the new safe snapshot to be used when a serializable transaction asks for a snapshot, replacing the previous value, if any. Any transactions waiting for a snapshot (either because there previously wasn't a safe snapshot on record or because they requested DEFERRABLE) could be provided the new snapshot and turned loose. (4) It's not inconceivable that we might want to send both (1) and (3) with the same commit. (5) Obviously, we can pick our heuristics for how often we try to refresh this, limiting it to avoid too much overhead, at the cost of less frequent snapshot updates for serializable transactions on the standbys. My assumption is that when we have a safe snapshot (which should be pretty close to all the time), we immediately provide it to any serializable transaction requesting a snapshot, except it seems to make sense to use the new DEFERRABLE mode to mean that you want to use the *next* one to arrive. This would effectively cause the point in time which was visible to serializable transactions to lag behind what is visible to other transactions by a variable amount, but would ensure that a serializable transaction couldn't see any serialization anomalies. It would also be immune to serialization failures from SSI logic; but obviously, standby-related cancellations would be in play. I don't know whether the older snapshots would tend to increase the standby-related cancellations, but it wouldn't surprise me. Hopefully this is enough for people to make something of it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, 2011-01-21 at 18:52 -0600, Kevin Grittner wrote: My assumption is that when we have a safe snapshot (which should be pretty close to all the time), we immediately provide it to any serializable transaction requesting a snapshot, except it seems to make sense to use the new DEFERRABLE mode to mean that you want to use the *next* one to arrive. How would it handle this situation: 1. Standby has safe snapshot S1 2. Primary does a VACUUM which removes some stuff visible in S1 3. Standby can't replay the VACUUM because it still has S1, but also can't get a new S2 because the WAL needed for that is behind the VACUUM So, S1 needs to be discarded. What do we do on the standby while there is no safe snapshot? I suppose throw errors -- I can't think of anything else. This would effectively cause the point in time which was visible to serializable transactions to lag behind what is visible to other transactions by a variable amount, but would ensure that a serializable transaction couldn't see any serialization anomalies. It would also be immune to serialization failures from SSI logic; but obviously, standby-related cancellations would be in play. I don't know whether the older snapshots would tend to increase the standby-related cancellations, but it wouldn't surprise me. I'm also a little concerned about the user-understandability here. Is it possible to make the following guarantees in this approach: 1. If transactions are completing on the primary, new snapshots will be taken on the standby; and 2. If no write transactions are in progress on the primary, then the standby will get a snapshot that represents the exact same data as on the primary? That would be fairly easy to explain to users. If there is a visibility lag, then we just say finish the write transactions, and progress will be made. And if the system is idle, they should see identical data. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Jeff Davis wrote: On Fri, 2011-01-21 at 18:52 -0600, Kevin Grittner wrote: My assumption is that when we have a safe snapshot (which should be pretty close to all the time), we immediately provide it to any serializable transaction requesting a snapshot, except it seems to make sense to use the new DEFERRABLE mode to mean that you want to use the *next* one to arrive. How would it handle this situation: 1. Standby has safe snapshot S1 2. Primary does a VACUUM which removes some stuff visible in S1 3. Standby can't replay the VACUUM because it still has S1, but also can't get a new S2 because the WAL needed for that is behind the VACUUM So, S1 needs to be discarded. What do we do on the standby while there is no safe snapshot? I suppose throw errors -- I can't think of anything else. We could wait for the next safe snapshot to arrive. I don't know how often that combination would occur, particulary in a situation where there were long-running serializable read write transactions on the master which would prevent a new safe snapshot from being generated. It seems as though a long-running transaction on the master would also block vacuum activity. I'm not sure how we can *really* know the frequency without field experience. This would effectively cause the point in time which was visible to serializable transactions to lag behind what is visible to other transactions by a variable amount, but would ensure that a serializable transaction couldn't see any serialization anomalies. It would also be immune to serialization failures from SSI logic; but obviously, standby-related cancellations would be in play. I don't know whether the older snapshots would tend to increase the standby-related cancellations, but it wouldn't surprise me. I'm also a little concerned about the user-understandability here. Is it possible to make the following guarantees in this approach: 1. If transactions are completing on the primary, new snapshots will be taken on the standby; and The rules there are rather complicated. Based on previous responses to posts where I've gotten into that detail, I fear that specifying it with complete accuracy would cause so many eyes to glaze over that nobody would get to the end of the description. I will do it if anybody asks, but without that I'll just say that the conditions for blocking a safe snapshot in a mix of short-lived read-write transactions are so esoteric that I expect that they would be uncommon in practical use. On the other hand, one long-running read-write transaction could block generation of a new safe snapshot indefinitely. Transactions declared as read-only or running at an isolation level other than serializable would have no impact on generation of a safe snapshot. 2. If no write transactions are in progress on the primary, then the standby will get a snapshot that represents the exact same data as on the primary? A snapshot taken while there are no serializable read write transactions active can immediately be declared safe. Whether such a snapshot is always available on the standby depends on what sort of throttling, if any, is used. That would be fairly easy to explain to users. If there is a visibility lag, then we just say finish the write transactions, and progress will be made. And if the system is idle, they should see identical data. Well, unless it's sync rep, you'll always have some latency between the master and the standby. And any throttling to control resource utilization could also cause latency between other transactions and serializable ones. But other than that, you're exactly on target. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
When I test your example, though, I'm getting the serialization failure on T3 rather than T2, so I'd call that a bug. Will investigate. Thanks again for your tests! You seem to be able to shake out issues better than anyone else! Once found, fixing them is not usually very hard, it's coming up with that creative usage pattern to *find* the problem which is the hard part. Thank you very much, but I do not deserve this honor. I was just constructing an example for myself so that I could understand why read only transaction might pose a problem. I posted it to help other people to see a concrete example of the problem. I had no idea this would show an actual bug in the code. OK if I add this one to our dcheck test suite, too? It is of course OK. And if you want to add this as an example in the documentation, it would be great. This is a simple, but concrete example of why read only serializable transaction might cause an anomaly. If I am not mistaken, there isn't any complete example in the documentation. It was hard for me to grasp why there might be a problem and I don't think I am alone. - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 20.01.2011 03:05, Kevin Grittner wrote: If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? +1 for generating an error. Before I go do that, I want to be sure everyone is clear about the state of things. If SSI is used to provide data integrity on the master, it will prevent any serialization anomalies from being persisted on any hot standby *long term*. For example, at any point where the standby is at a point in the transaction stream where there were no read/write transaction active, no anomalies can be observed. (That isn't the *only* time; it's just the simplest one to describe as an example.) Queries on the standby can, however, see *transient* anomalies when they run queries which would cause a serialization failure if run on the master at the same point in the transaction stream. This can only occur when, of two concurrent transactions, the one which *appears* to run second because the other can't read what it wrote, *commits* first. The most common and alarming situation where this occurs, in my opinion, is batch processing. This is extremely common in financial applications, and tends to show up in a lot of other places, too. (The receipting query set is an instance of this type of problem, but I'm going to keep it more general in hopes that people can see where it impacts them.) Imagine an application which has some small control record in a table, and inserts to some other table are assigned to a batch based on the control record. The batches are normally identified by ascending dates or serial numbers. Periodically a new batch is opened and the old batch is closed by updating a current batch id column in the control table. If the batch ID is updated and the transaction in which that update was executed commits while a transaction which read the old batch ID is still in flight, a read of the database will show that the batch is closed, but if you look at the detail of the batch, it will not yet be complete. Under SSI, one of these transactions will be canceled to prevent this. Our implementation will always allow the update which closes the batch to complete, and either the insert or the select of the detail will be rolled back with a serialization failure, depending on the timing the actions inside those transactions. If the insert fails, it can be retried, and will land in the new batch -- making the list of the batch which omits it OK. If the listing of the batch details is canceled, it will be because the insert into the old batch committed before it recognized the problem, so an immediate retry of the select will see the complete batch contents. A hot standby can't really take part in the predicate locking and transaction cancellation on the master. Dan and I have both come to the conclusion that the only reasonable way to allow hot standby to work with SSI is for the WAL (when wal_level = hot_standby) to contain information about which snapshots develop which won't see such a state. In the above example, barring some throttling mechanism skipping these particular snapshots, or other problematic conflicts around the same time, the master would tell the standby that the snapshot before either of the two problem transactions was OK, and then it would tell them that the snapshot after both had committed was OK. It would not suggest using the snapshot available between the commit of the control record update and the commit of the insert into the batch. This seems to me to be not completely unrelated to the snapshot synchronization patch. It is clearly closely related to the READ ONLY DEFERRABLE mode, which also looks for a snapshot which is immune to serialization anomalies without predicate locking, conflict detection, transaction cancellation, etc. Melding these two things with hot standby seems to be beyond what can reasonably happen for 9.1 without delaying the release. If someone is using one feature and not the other, they really don't have a problem. Like anyone else, if a hot standby user has been using SERIALIZABLE mode under 9.0 or earlier, they will need to switch to REPEATABLE READ. A SERIALIZABLE user who doesn't set up hot standby has no issue. Opinions so far seem to be in favor of reporting an error on the standby if SERIALIZABLE is requested, so that people don't silently get less protection than they expect. The most annoying thing about that is that if the use would *like* to use truly serializable transactions on the standby, and will do so when they get it in 9.2, they must switch to REPEATABLE READ now, and switch back to SERIALIZABLE with the next release. So, based on a more complete description of the issues, any more opinions
Re: [HACKERS] SSI and Hot Standby
Kevin, So, based on a more complete description of the issues, any more opinions on whether to generate the error, as suggested by Heikki? If it's a choice between generating an error and letting users see inconsistent data, I'll take the former. Does anyone think this justifies the compatibility GUC as suggested by Jeff? I think it might, yes. Since someone could simply turn on the backwards compatibility flag for 9.1 and turn it off for 9.2, rather than trying to mess with transaction states which might be set in application code. Unfortunately, people have not responded to our survey :-( http://www.postgresql.org/community/survey.77 -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. Why not? We already generate appropriate snapshots for this in SSI, so is the problem in getting the appropriate information into the WAL stream or in having a request for a snapshot within a serializable transaction while running in hot standby the problem? It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. Because the standby would need to bombard the server with a stream of predicate lock information, we would need to allow transactions on the master to be canceled do in part to activity on the standby, and I don't even know how you would begin to track read/write conflicts between transactions on two different clusters. If any of that didn't make sense, it would probably be more efficient for everyone involved if those interested browsed the Overview section of the Wiki page than to have me duplicate its contents in a post. http://wiki.postgresql.org/wiki/Serializable -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Jan21, 2011, at 00:11 , Simon Riggs wrote: It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. The problem is that snapshots taken on the master sometimes represent a state of the database which cannot occur under any (valid) serial schedule. Hence, if you use that snapshot to read the *whole* database, you've surely violated serializability. If you read only parts of the database, things may or may not be fine, depending on the parts you read. To have the same stringent guarantees that SERIALIZABLE provides on the master also for queries run against the slave, you somehow need to prevent this. The easiest way is to only use snapshots on the slave which *cannot* produce such anomalies. We already know now to generate such snapshots - SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question is mainly how to transfer such snapshots to the slave, and how often we transmit a new one. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
I wrote: Why not? We already generate appropriate snapshots for this in SSI, so is the problem in getting the appropriate information into the WAL stream or in having a request for a snapshot within a serializable transaction while running in hot standby the problem? I dropped few words. That was supposed to ask whether the problem was in getting hot standby to *use such a snapshot*. I'm open to other suggestions on how else we might do this. I don't see any alternatives, but maybe you're seeing some possibility that eludes me. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. I'm pretty concerned about the performance implications, too. In particular that sounds like you could get an unbounded amount of WAL emitted from a *purely read only* transaction flow. Which is not going to fly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, 2011-01-21 at 00:26 +0100, Florian Pflug wrote: On Jan21, 2011, at 00:11 , Simon Riggs wrote: It's not clear to me what the reason is that this doesn't just work on HS already. If you started there it might help. The problem is that snapshots taken on the master sometimes represent a state of the database which cannot occur under any (valid) serial schedule. Hence, if you use that snapshot to read the *whole* database, you've surely violated serializability. If you read only parts of the database, things may or may not be fine, depending on the parts you read. To have the same stringent guarantees that SERIALIZABLE provides on the master also for queries run against the slave, you somehow need to prevent this. The easiest way is to only use snapshots on the slave which *cannot* produce such anomalies. We already know now to generate such snapshots - SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question is mainly how to transfer such snapshots to the slave, and how often we transmit a new one. Thank you for explaining a little more. What I'm still not clear on is why that HS is different. Whatever rules apply on the master must also apply on the standby, immutably. Why is it we need to pass explicit snapshot information from master to standby? We don't do that, except at startup for normal HS. Why do we need that? I hear, but do not yet understand, that the SSI transaction sequence on the master may differ from the WAL transaction sequence. Is it important that the ordering on the master would differ from the standby? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Tom Lane wrote: I'm pretty concerned about the performance implications, too. In particular that sounds like you could get an unbounded amount of WAL emitted from a *purely read only* transaction flow. No. Read only transactions wouldn't create any flow at all. And I suggested that we might want some kind of throttle on how often we generate snapshots even from the read write transactions. I'm not at all clear on how you got to the concerns you have. Is there something in particular I could clear up for you that isn't already mentioned in the previous emails? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
What I'm still not clear on is why that HS is different. Whatever rules apply on the master must also apply on the standby, immutably. Why is it we need to pass explicit snapshot information from master to standby? We don't do that, except at startup for normal HS. Why do we need that? I hear, but do not yet understand, that the SSI transaction sequence on the master may differ from the WAL transaction sequence. Is it important that the ordering on the master would differ from the standby? The logical serializable ordering of transactions in SSI doesn't necessarily match the commit time ordering (i.e. the WAL sequence). For example, with two concurrent transactions, T1 might commit after T2, even though it didn't see the changes made by T2 and thus has to be considered earlier. It doesn't matter whether T1 committed before T2 or the other way around, as long as no other transaction can tell the difference. If someone saw the changes made by T1 but not those made by T2, they'd see T2 as happening before T1, violating serializability. Our SSI code ensures that doesn't happen by tracking read dependencies. If it detects that such a read is happening, it rolls back one of the transactions involved. Now, if we extend this to hot standby, if T2 commits before T1 on the master, it obviously will on the slave too. A transaction run on the slave at the right time might be able to see that T2 has happened but not T1, which is unserializable. If that transaction had ben run on the master, then it would have been detected and something would have been rolled back, but the master has no way to know what data is being read on the slave. What Kevin is suggesting is that we already have a mechanism for identifying snapshots where serialization failures like these will never happen. If we pass that information to the slave and allow it to run transactions only on those snapshots, serializability is safe. Hopefully that made some more sense... Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Jan21, 2011, at 01:28 , Simon Riggs wrote: What I'm still not clear on is why that HS is different. Whatever rules apply on the master must also apply on the standby, immutably. Why is it we need to pass explicit snapshot information from master to standby? We don't do that, except at startup for normal HS. Why do we need that? I hear, but do not yet understand, that the SSI transaction sequence on the master may differ from the WAL transaction sequence. Is it important that the ordering on the master would differ from the standby? The COMMIT order in the actual, concurrent, schedule doesn't not necessarily represent the order of the transaction in an equivalent serial schedule. Here's an example T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T1: UPDATE D1 ... ; T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T2: SELECT * FROM D1 ... ; T2: UPDATE D2 ... ; T1: COMMIT; T3: SELECT * FROM D1, D2; T2: COMMIT; Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent serial schedule. In any such schedule T2 must run before T1 because T2 didn't see T1's changes to D1 T3 must run after T1 because T3 did see T1's changes to D1 T3 must run before T2 because T3 didn't see T2's changes to D2 This is obviously impossible - if T3 runs before T2 and T2 runs before T1 then T3 runs before T1, contradicting the second requirement. There is thus no equivalent serial schedule and we must abort of these transactions with a serialization error. Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone, an equivalent serial schedule is T2,T1! On the master, these run before requirement are tracked by remembering which transaction read which parts of the data via the SIREAD-lock mechanism (These are more flags than locks, since nobody ever blocks on them). Since we do not want to report SIREAD locks back to the master, the slave has to prevent this another way. Kevin's proposed solution does that by only using those snapshots on the slave for which reading the *whole* database is safe. The downside is that whether or not a snapshot is safe can only be decided after all concurrent transactions have finished. The snapshot is thus always a bit outdated, but shows that state that is known to be possible in some serial schedule. The very same mechanism can be used on the master also by setting the isolation level to SERIALIZABLE READ ONLY DEFERRED. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Tom Lane wrote: Simon Riggs writes: On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: The idea is that whenever we see a valid snapshot which would yield a truly serializable view of the data for a READ ONLY transaction, we add a WAL record with that snapshot information. You haven't explained why this approach is the way forwards. What other options have been ruled out, and why. The above approach doesn't sound particularly viable to me. I'm pretty concerned about the performance implications, too. In particular that sounds like you could get an unbounded amount of WAL emitted from a *purely read only* transaction flow. Which is not going to fly. Ah, coming back to this and re-reading, I think I see the point of confusion. The technique we're suggesting is based on the fact that the *standby* is read only. The flow of information about snapshots (which might be done as actual snapshots with xid values, or possibly as marker records saying when a candidate snapshot is being considered and when the last one has been found acceptable) would be from the master *to* the standby, based on *read write transactions on the master*. They would be informing the slave of what would be a snapshot guaranteed not to see a serialization anomaly to a read only transaction. As I mentioned in another email, we might want to throttle this. My thinking was that we could start a timer on capturing a snapshot, and continue to gather new ones as they become available. When you hit the timer limit (maybe 100ms?) you send the latest snapshot, if you have a new one; otherwise you keep trying and send one as soon as you get it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Thu, Jan 20, 2011 at 8:54 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: As I mentioned in another email, we might want to throttle this. My thinking was that we could start a timer on capturing a snapshot, and continue to gather new ones as they become available. When you hit the timer limit (maybe 100ms?) you send the latest snapshot, if you have a new one; otherwise you keep trying and send one as soon as you get it. I think this is likely to suck. That's introducing 10 not-small XLOG records per second just in case someone happens to try to start a serializable transaction on a standby server. A possibly-viable alternative would be to build something into the SR protocol to allow the standby to request a workable snapshot from the master, and the master to send it (out-of-band with respect to the WAL stream) when so requested. Then it wouldn't work if you lose the connection to the master, but maybe that's OK. Even with that, it seems like there could be starvation problems - is there an upper bound on the length of time it would take the master to generate a safe snapshot for the standby to use? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Robert Haas wrote: Kevin Grittner wrote: As I mentioned in another email, we might want to throttle this. My thinking was that we could start a timer on capturing a snapshot, and continue to gather new ones as they become available. When you hit the timer limit (maybe 100ms?) you send the latest snapshot, if you have a new one; otherwise you keep trying and send one as soon as you get it. I think this is likely to suck. That's introducing 10 not-small XLOG records per second just in case someone happens to try to start a serializable transaction on a standby server. That depends on whether we can pull off the idea for not sending the snapshot itself which I mentioned. But that idea is pretty sketchy at the moment. I can't swear we can make that work, but if we can, it should use a lot less WAL space. A possibly-viable alternative would be to build something into the SR protocol to allow the standby to request a workable snapshot from the master, and the master to send it (out-of-band with respect to the WAL stream) when so requested. If we can make that work, that has advantages. it seems like there could be starvation problems - is there an upper bound on the length of time it would take the master to generate a safe snapshot for the standby to use? Unfortunately, to get a safe snapshot you need to grab a candidate snapshot and wait for all serializable read write transactions which were active at the time to complete. At a minimum. If any of them develop the wrong pattern of rw-dependencies you have to discard it, grab a new snapshot, and try again. I suspect that most of the time you will succeed on the first snapshot, and so will be able to call it safe when the last concurrent serializable read write transaction completes, but that *could* be a long time, and there is no upper bound. That's why I was suggesting that we try to keep a fairly current safe snapshot sitting on the standby and use it when a serializable transaction is requested. Unless you request DEFERRABLE, in which case you would wait for the *next* one to arrive. I keep looking for another angle on this, but I'm not finding it. I've *thought* I had something a couple times this evening while tossing it around in my head, but the ideas fall apart on closer inspection. :-( -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 21.01.2011 03:19, Dan Ports wrote: What I'm still not clear on is why that HS is different. Whatever rules apply on the master must also apply on the standby, immutably. Why is it we need to pass explicit snapshot information from master to standby? We don't do that, except at startup for normal HS. Why do we need that? I hear, but do not yet understand, that the SSI transaction sequence on the master may differ from the WAL transaction sequence. Is it important that the ordering on the master would differ from the standby? The logical serializable ordering of transactions in SSI doesn't necessarily match the commit time ordering (i.e. the WAL sequence). For example, with two concurrent transactions, T1 might commit after T2, even though it didn't see the changes made by T2 and thus has to be considered earlier. It doesn't matter whether T1 committed before T2 or the other way around, as long as no other transaction can tell the difference. If someone saw the changes made by T1 but not those made by T2, they'd see T2 as happening before T1, violating serializability. Our SSI code ensures that doesn't happen by tracking read dependencies. If it detects that such a read is happening, it rolls back one of the transactions involved. Now, if we extend this to hot standby, if T2 commits before T1 on the master, it obviously will on the slave too. A transaction run on the slave at the right time might be able to see that T2 has happened but not T1, which is unserializable. If that transaction had ben run on the master, then it would have been detected and something would have been rolled back, but the master has no way to know what data is being read on the slave. We have enough information in the standby to reconstruct all writes done in the master. I gather that's not enough, in order to roll back read-only transaction T3 on the standby which would see an anomaly, we'd also need to know what reads T1 and T2 did in the master. Is that correct? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Fri, Jan 21, 2011 at 08:44:59AM +0200, Heikki Linnakangas wrote: We have enough information in the standby to reconstruct all writes done in the master. I gather that's not enough, in order to roll back read-only transaction T3 on the standby which would see an anomaly, we'd also need to know what reads T1 and T2 did in the master. Is that correct? That's some of the information we need, but it's not enough... The problem is that the conflict might not be discovered until after T3 (the reader) commits. In that case, it's too late to abort T3, so you'd need to roll back T2 instead. But that means a read-only transaction on the slave has to be able to cause a concurrent read-write transaction on the master to abort, which brings with it no end of problems. To make that a little more concrete, let me borrow Kevin's favorite batch processing example... [master] T2: BEGIN [master] T2: SELECT FROM control [master] T1: BEGIN [master] T1: UPDATE control [master] T1: COMMIT [slave] T3: BEGIN [slave] T3: SELECT FROM control, receipt [slave] T3: COMMIT [master] T2: INSERT INTO receipt [master] T2: COMMIT If this all happened at the master, T2 would get rolled back when it tries to do its INSERT. (I just tried it.) But if T3 happened on the slave, the master doesn't know that it read both tables, nor does the slave know at the time it's executing T3 that it's going to conflict with T2. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: Here's an issue for feedback from the community -- do we want to support truly serializable transactions on hot standby machines? In this release? Maybe? In later releases? Yes. If it blocks your excellent contribution in this release, then from me, no. If you can achieve this in this release, yes. However, if this is difficult or complex, then I would rather say not yet quickly now, than spend months working out the weirdnesses and possibly still get them wrong. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Simon Riggs si...@2ndquadrant.com wrote: In this release? Maybe? In later releases? Yes. If it blocks your excellent contribution in this release, then from me, no. If you can achieve this in this release, yes. However, if this is difficult or complex, then I would rather say not yet quickly now, than spend months working out the weirdnesses and possibly still get them wrong. We already have a mechanism for generating a good snapshot, the hard part (for me at least) would be to get that snapshot over to the hot standby and have it use the latest one on a request for a serializable transaction. I have no experience with WAL file output, and don't know what it would take for hot standby to use it as I describe. I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an RD item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. If not, I agree this can be 9.2 material. We just have to decide how to document it and answer the questions near the bottom of my initial post of the thread. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, Jan 19, 2011 at 8:34 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an RD item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. I think it's way too late to be embarking on what will probably turn out to be a reasonably complex and possibly controversial new development arc. I don't have a strong position on what we should do instead, but let's NOT do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Kevin's suggestion seems eminently reasonable to me and probably the best approach one can do for SSI and hot standby. Pulling it off in time for 9.1 would be a stretch; 9.2 seems quite doable. It's worth noting that one way or another, the semantics of SERIALIZABLE transactions on hot standby replicas could be surprising to some. There's no getting around this; serializability in distributed systems is just a hard problem in general. Either we go with Kevin's suggestion of treating SERIALIZABLE transactions as DEFERRABLE (whether now or for 9.2), causing them to have to use an older snapshot or block until an acceptable snapshot becomes available; or we require them to be downgraded to REPEATABLE READ either implicitly or explicitly. Now, neither of these is as alarming as they might sound, given that replication lag is a fact of life for hot standby systems and REPEATABLE READ is exactly the same as the current (9.0) SERIALIZABLE behavior. But it's definitely something that should be addressed in documentation. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:34 -0600, Kevin Grittner wrote: I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an RD item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. I gave you a quick response to let you know that HS need not be a blocker, for this release. If you are saying you have knowingly ignored a requirement for a whole year, then I am shocked. How exactly did you think this would ever be committed? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
Robert Haas wrote: Kevin Grittner wrote: I agree it's pretty late in the cycle, but I'm going through all the loose ends and found this one -- which has been hanging out on the Wiki page as an RD item for over a full year without discussion. :-( If we provide the snapshots (which we can safely and easily do), can someone else who knows what they're doing with WAL and HS get the rest of it safely into the release? That seems to me to be the only way it can still happen for 9.1. I think it's way too late to be embarking on what will probably turn out to be a reasonably complex and possibly controversial new development arc. I don't have a strong position on what we should do instead, but let's NOT do that. If that can't reasonably be done for 9.1, well, my next sentence was: If not, I agree this can be 9.2 material. It'd be sweet if it could still happen 9.1, but hardly a shock if it can't. I didn't want to presume to make the call. Like I said at the start, the alternative is to decide how noisy we want to be about providing snapshot isolation on hot standbys when SERIALIZABLE is requested, and figuring out where to document it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
* Simon Riggs (si...@2ndquadrant.com) wrote: I gave you a quick response to let you know that HS need not be a blocker, for this release. If you are saying you have knowingly ignored a requirement for a whole year, then I am shocked. How exactly did you think this would ever be committed? Erm, to be perfectly honest, I think the answer is probably I was busy., and no one provided any feedback on *how* to deal with it. Given the amount of work that Kevin's put into this patch (which has been beyond impressive, imv), I have a hard time finding fault with him not getting time to implement a solution for Hot Standby for this. As you say, it's not a blocker, I agree completely with that, regardless of when it was identified as an issue. What we're talking about is right now, and right now is too late to fix it for HS, and to be perfectly frank, fixing it for HS isn't required or even a terribly important factor in if it should be committed. I'll refrain from casting stones about issues brought up nearly a year ago on certain other patches which are apparently not going to include what I, at least, consider extremely important to PG acceptance by others. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] SSI and Hot Standby
Simon Riggs wrote: I gave you a quick response to let you know that HS need not be a blocker, for this release. If you are saying you have knowingly ignored a requirement for a whole year, then I am shocked. How exactly did you think this would ever be committed? I was asked not to discuss this effort on list for most of that time, and while it was on the Wiki page, I just lost track of it -- not maliciously or intentionally. I really apologize. By the time the 9.0 release was out and it was deemed OK for me to discuss things, I started getting feedback on problems which needed response, and I got into the mode of reacting to that rather than ticking through my issues list. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote: If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? Thoughts? Hopefully there is a better option available. We don't want to silently give wrong results. Maybe we should bring back the compatibility GUC? It could throw an error unless the user sets the compatibility GUC to turn serializable into repeatable read. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI and Hot Standby
On 20.01.2011 03:05, Kevin Grittner wrote: If we don't do something like this, do we just provide REPEATABLE READ on the standby as the strictest level of transaction isolation? If so, do we generate an error on a request for SERIALIZABLE, warn and provide degraded behavior, or just quietly give them REPEATABLE READ behavior? +1 for generating an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers