Re: In-place updates and serializable transactions
On Fri, Nov 16, 2018 at 4:07 AM Kevin Grittner wrote: > > On Thu, Nov 15, 2018 at 3:03 AM Kuntal Ghosh > wrote: > > > The test multiple-row-versions is failing because of the > > above-discussed scenario. I've attached the regression diff file and > > the result output file for the same. Here is a brief summary of the > > test w.r.t. heap: > > > > Step 1: T1-> BEGIN; Read FROM t where id=100; > > Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2) > > Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50; > > Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1; > > COMMIT; (creates T3->T4) > > Step 5: T3-> COMMIT; > > Step 6: T1-> UPDATE t where id=1; COMMIT; (creates T4->T1,) > > > > At step 6, when the update statement is executed, T1 is rolled back > > because of T3->T4->T1. > > > > But for zheap, step 3 also creates a dependency T1->T3 because of > > in-place update. When T4 commits in step 4, it marks T3 as doomed > > because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back. > > If I understand this, no permutation (order of execution of the > statements in a set of concurrent transactions vulnerable to > serialization anomalies) which have succeeded with the old storage > engine now fail with zheap; what we have with zheap is an earlier > failure in one case. More importantly, zheap doesn't create any false > negatives (cases where a serialization anomaly is missed). > Your understanding is correct. Thanks for sharing your feedback. > I would say this should be considered a resounding success. We should > probably add an alternative result file to cover this case, but > otherwise I don't see anything which requires action. > > Congratulations on making this work so well! > Thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: In-place updates and serializable transactions
On Thu, Nov 15, 2018 at 3:03 AM Kuntal Ghosh wrote: > The test multiple-row-versions is failing because of the > above-discussed scenario. I've attached the regression diff file and > the result output file for the same. Here is a brief summary of the > test w.r.t. heap: > > Step 1: T1-> BEGIN; Read FROM t where id=100; > Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2) > Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50; > Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1; > COMMIT; (creates T3->T4) > Step 5: T3-> COMMIT; > Step 6: T1-> UPDATE t where id=1; COMMIT; (creates T4->T1,) > > At step 6, when the update statement is executed, T1 is rolled back > because of T3->T4->T1. > > But for zheap, step 3 also creates a dependency T1->T3 because of > in-place update. When T4 commits in step 4, it marks T3 as doomed > because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back. If I understand this, no permutation (order of execution of the statements in a set of concurrent transactions vulnerable to serialization anomalies) which have succeeded with the old storage engine now fail with zheap; what we have with zheap is an earlier failure in one case. More importantly, zheap doesn't create any false negatives (cases where a serialization anomaly is missed). I would say this should be considered a resounding success. We should probably add an alternative result file to cover this case, but otherwise I don't see anything which requires action. Congratulations on making this work so well! -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: In-place updates and serializable transactions
On Thu, Nov 15, 2018 at 3:09 AM Kevin Grittner wrote: > > On Wed, Nov 14, 2018 at 5:43 AM Joshua Yanovski > wrote: > > > This is only a personal anecdote, but from my own experience with > > serializability, this sort of blind update isn't often contended in > > realistic workloads. > > > So, if this only affects transactions with blind updates, I doubt it will > > cause much pain in real workloads (even though it might look bad in > > benchmarks which include a mix of blind writes and rmw operations). > > Particularly if it only happens if you explicitly opt into zheap storage. > > I agree with all of that, but will be very interested in what > failures, if any, kick out from the "isolation" test set when all > tables are created using zheap. I added all the common failure > patterns I had seen to that set, and other have filled in some corner > cases I missed since then, so if everything there passes I would not > worry about it at all. If we do see some failures, we can take > another look to see whether any action is needed. Thanks Kevin for your explanation. The isolation test suits are really helpful for testing serializable test scenarios for zheap. The test multiple-row-versions is failing because of the above-discussed scenario. I've attached the regression diff file and the result output file for the same. Here is a brief summary of the test w.r.t. heap: Step 1: T1-> BEGIN; Read FROM t where id=100; Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2) Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50; Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1; COMMIT; (creates T3->T4) Step 5: T3-> COMMIT; Step 6: T1-> UPDATE t where id=1; COMMIT; (creates T4->T1,) At step 6, when the update statement is executed, T1 is rolled back because of T3->T4->T1. But for zheap, step 3 also creates a dependency T1->T3 because of in-place update. When T4 commits in step 4, it marks T3 as doomed because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com multiple-row-versions.out Description: Binary data regression.diffs Description: Binary data
Re: In-place updates and serializable transactions
On Wed, Nov 14, 2018 at 5:13 PM Joshua Yanovski wrote: > > This is only a personal anecdote, but from my own experience with > serializability, this sort of blind update isn't often contended in realistic > workloads. The reason is that (again, IME), most blind writes are either > insertions, or "read-writes in disguise" (the client read an old value in a > different transaction); in the latter case, the data in question are often > logically "owned" by the client, and will therefore rarely be contended. I > think there are two major exceptions to this: transactions that perform > certain kinds of monotonic updates (for instance, marking a row complete in a > worklist irrespective of whether it was already completed), and automatic > bulk updates. However, these were exactly the classes of transactions that > we already ran under a lower isolation level than serializability, since they > have tightly constrained shapes and don't benefit much from the additional > guarantees. > > So, if this only affects transactions with blind updates, I doubt it will > cause much pain in real workloads (even though it might look bad in > benchmarks which include a mix of blind writes and rmw operations). > Particularly if it only happens if you explicitly opt into zheap storage. > Thanks Joshua for sharing your input on this. I'm not aware of any realistic workloads for serializable transactions. So, it is really helpful. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
Re: In-place updates and serializable transactions
On Wed, Nov 14, 2018 at 5:43 AM Joshua Yanovski wrote: > This is only a personal anecdote, but from my own experience with > serializability, this sort of blind update isn't often contended in realistic > workloads. > So, if this only affects transactions with blind updates, I doubt it will > cause much pain in real workloads (even though it might look bad in > benchmarks which include a mix of blind writes and rmw operations). > Particularly if it only happens if you explicitly opt into zheap storage. I agree with all of that, but will be very interested in what failures, if any, kick out from the "isolation" test set when all tables are created using zheap. I added all the common failure patterns I had seen to that set, and other have filled in some corner cases I missed since then, so if everything there passes I would not worry about it at all. If we do see some failures, we can take another look to see whether any action is needed. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: In-place updates and serializable transactions
On Tue, Nov 13, 2018 at 10:45 PM Kuntal Ghosh wrote: > Currently, we're working on the serializable implementations for > zheap. Great! > If transaction T1 reads a row version (thus acquiring a predicate lock > on it) and a second transaction T2 updates that row version (thus > creating a rw-conflict graph edge from T1 to T2), must a third > transaction T3 which re-updates the new version of the row also have a > rw-conflict in from T1 to prevent anomalies? In other words, does it > matter whether we recognize the edge T1 --rw--> T3? No. Keep in mind that extensive study has shown that snapshot isolation can only be non-serializable if there is a cycle in the apparent order of execution and that this can only occur if there is a "dangerous structure" of two adjacent read-write antidependencies (a/k/a read-write dependencies, a/k/a rw-conflicts) *AND* the transaction you identify as T3 in that structure *IS THE FIRST TRANSACTION IN THE CYCLE TO COMMIT*. Looking at the implied T1/T3 relationship and looking for a T4 to complete the structure is not necessary, because there are proofs that three *ADJACENT* transactions are necessary for a serialization anomaly to be seen. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/
Re: In-place updates and serializable transactions
This is only a personal anecdote, but from my own experience with serializability, this sort of blind update isn't often contended in realistic workloads. The reason is that (again, IME), most blind writes are either insertions, or "read-writes in disguise" (the client read an old value in a different transaction); in the latter case, the data in question are often logically "owned" by the client, and will therefore rarely be contended. I think there are two major exceptions to this: transactions that perform certain kinds of monotonic updates (for instance, marking a row complete in a worklist irrespective of whether it was already completed), and automatic bulk updates. However, these were exactly the classes of transactions that we already ran under a lower isolation level than serializability, since they have tightly constrained shapes and don't benefit much from the additional guarantees. So, if this only affects transactions with blind updates, I doubt it will cause much pain in real workloads (even though it might look bad in benchmarks which include a mix of blind writes and rmw operations). Particularly if it only happens if you explicitly opt into zheap storage. On Wed, Nov 14, 2018 at 5:46 AM Kuntal Ghosh wrote: > In brief, due to in-place updates, in some cases, the false positives > may increase for serializable transactions. Any thoughts? > > [1] src/backend/storage/lmgr/README-SSI > [2] src/test/isolation/specs/multiple-row-versions.spec > -- > Thanks & Regards, > Kuntal Ghosh > EnterpriseDB: http://www.enterprisedb.com > >
In-place updates and serializable transactions
Hello hackers, Currently, we're working on the serializable implementations for zheap. As mentioned in README-SSI documentation[1], there is one difference in SSI implementation of PostgreSQL that can differentiate the conflict detection behaviour with other storage engines that supports updates in-place. The heap storage in PostgreSQL does not use "update in place" with a rollback log for its MVCC implementation. Where possible it uses "HOT" updates on the same page (if there is room and no indexed value is changed). For non-HOT updates the old tuple is expired in place and a new tuple is inserted at a new location. Because of this difference, a tuple lock in PostgreSQL doesn't automatically lock any other versions of a row. We can take the following example from the doc to understand the situation in more detail: T1 ---rw---> T2 ---ww--->T3 If transaction T1 reads a row version (thus acquiring a predicate lock on it) and a second transaction T2 updates that row version (thus creating a rw-conflict graph edge from T1 to T2), must a third transaction T3 which re-updates the new version of the row also have a rw-conflict in from T1 to prevent anomalies? In other words, does it matter whether we recognize the edge T1 --rw--> T3? The document also includes a nice proof for why we don't try to copy or expand a tuple lock to any other versions of the row or why we don't have to explicitly recognize the edge T1 --rw--> T3. In PostgreSQL, the predicate locking is implemented using the tuple id. In zheap, since we perform updates in-place, we don't change the tuple id. So, in the above example, we easily recognize the edge T1--rw--> T3. This may increase the number of false positives for certain cases. In the above example, if we introduce another transaction T4 such that T3 --rw--> T4 and T4 gets committed first, for zheap, T3 will be rolled back because of the dangerous structure T1 --rw--> T3 --rw--> T4. But, for heap, T3 can be committed(isolation test case [2]). IMHO, this seems to be an acceptable behavior. In brief, due to in-place updates, in some cases, the false positives may increase for serializable transactions. Any thoughts? [1] src/backend/storage/lmgr/README-SSI [2] src/test/isolation/specs/multiple-row-versions.spec -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com