Re: In-place updates and serializable transactions

2018-11-18 Thread Amit Kapila
On Fri, Nov 16, 2018 at 4:07 AM Kevin Grittner  wrote:
>
> On Thu, Nov 15, 2018 at 3:03 AM Kuntal Ghosh  
> wrote:
>
> > The test multiple-row-versions is failing because of the
> > above-discussed scenario. I've attached the regression diff file and
> > the result output file for the same. Here is a brief summary of the
> > test w.r.t. heap:
> >
> > Step 1: T1-> BEGIN; Read FROM t where id=100;
> > Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2)
> > Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50;
> > Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1;
> > COMMIT;  (creates T3->T4)
> > Step 5: T3-> COMMIT;
> > Step 6: T1-> UPDATE t where id=1; COMMIT;  (creates T4->T1,)
> >
> > At step 6, when the update statement is executed, T1 is rolled back
> > because of T3->T4->T1.
> >
> > But for zheap, step 3 also creates a dependency T1->T3 because of
> > in-place update. When T4 commits in step 4, it marks T3 as doomed
> > because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back.
>
> If I understand this, no permutation (order of execution of the
> statements in a set of concurrent transactions vulnerable to
> serialization anomalies) which have succeeded with the old storage
> engine now fail with zheap; what we have with zheap is an earlier
> failure in one case.  More importantly, zheap doesn't create any false
> negatives (cases where a serialization anomaly is missed).
>

Your understanding is correct.  Thanks for sharing your feedback.

> I would say this should be considered a resounding success.  We should
> probably add an alternative result file to cover this case, but
> otherwise I don't see anything which requires action.
>
> Congratulations on making this work so well!
>

Thanks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: In-place updates and serializable transactions

2018-11-15 Thread Kevin Grittner
On Thu, Nov 15, 2018 at 3:03 AM Kuntal Ghosh  wrote:

> The test multiple-row-versions is failing because of the
> above-discussed scenario. I've attached the regression diff file and
> the result output file for the same. Here is a brief summary of the
> test w.r.t. heap:
>
> Step 1: T1-> BEGIN; Read FROM t where id=100;
> Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2)
> Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50;
> Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1;
> COMMIT;  (creates T3->T4)
> Step 5: T3-> COMMIT;
> Step 6: T1-> UPDATE t where id=1; COMMIT;  (creates T4->T1,)
>
> At step 6, when the update statement is executed, T1 is rolled back
> because of T3->T4->T1.
>
> But for zheap, step 3 also creates a dependency T1->T3 because of
> in-place update. When T4 commits in step 4, it marks T3 as doomed
> because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back.

If I understand this, no permutation (order of execution of the
statements in a set of concurrent transactions vulnerable to
serialization anomalies) which have succeeded with the old storage
engine now fail with zheap; what we have with zheap is an earlier
failure in one case.  More importantly, zheap doesn't create any false
negatives (cases where a serialization anomaly is missed).

I would say this should be considered a resounding success.  We should
probably add an alternative result file to cover this case, but
otherwise I don't see anything which requires action.

Congratulations on making this work so well!

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



Re: In-place updates and serializable transactions

2018-11-15 Thread Kuntal Ghosh
On Thu, Nov 15, 2018 at 3:09 AM Kevin Grittner  wrote:
>
> On Wed, Nov 14, 2018 at 5:43 AM Joshua Yanovski
>  wrote:
>
> > This is only a personal anecdote, but from my own experience with 
> > serializability, this sort of blind update isn't often contended in 
> > realistic workloads.
>
> > So, if this only affects transactions with blind updates, I doubt it will 
> > cause much pain in real workloads (even though it might look bad in 
> > benchmarks which include a mix of blind writes and rmw operations).  
> > Particularly if it only happens if you explicitly opt into zheap storage.
>
> I agree with all of that, but will be very interested in what
> failures, if any, kick out from the "isolation" test set when all
> tables are created using zheap.  I added all the common failure
> patterns I had seen to that set, and other have filled in some corner
> cases I missed since then, so if everything there passes I would not
> worry about it at all.  If we do see some failures, we can take
> another look to see whether any action is needed.
Thanks Kevin for your explanation. The isolation test suits are really
helpful for testing serializable test scenarios for zheap.

The test multiple-row-versions is failing because of the
above-discussed scenario. I've attached the regression diff file and
the result output file for the same. Here is a brief summary of the
test w.r.t. heap:

Step 1: T1-> BEGIN; Read FROM t where id=100;
Step 2: T2-> BEGIN; UPDATE t where id=100; COMMIT; (creates T1->T2)
Step 3: T3-> BEGIN; UPDATE t where id=100; Read FROM t where id=50;
Step 4: T4-> BEGIN; UPDATE t where id= 50; Read FROM t where id=1;
COMMIT;  (creates T3->T4)
Step 5: T3-> COMMIT;
Step 6: T1-> UPDATE t where id=1; COMMIT;  (creates T4->T1,)

At step 6, when the update statement is executed, T1 is rolled back
because of T3->T4->T1.

But for zheap, step 3 also creates a dependency T1->T3 because of
in-place update. When T4 commits in step 4, it marks T3 as doomed
because of T1 --> T3 --> T4. Hence, in step 5, T3 is rolled back.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


multiple-row-versions.out
Description: Binary data


regression.diffs
Description: Binary data


Re: In-place updates and serializable transactions

2018-11-15 Thread Kuntal Ghosh
On Wed, Nov 14, 2018 at 5:13 PM Joshua Yanovski
 wrote:
>
> This is only a personal anecdote, but from my own experience with 
> serializability, this sort of blind update isn't often contended in realistic 
> workloads.  The reason is that (again, IME), most blind writes are either 
> insertions, or "read-writes in disguise" (the client read an old value in a 
> different transaction); in the latter case, the data in question are often 
> logically "owned" by the client, and will therefore rarely be contended.  I 
> think there are two major exceptions to this: transactions that perform 
> certain kinds of monotonic updates (for instance, marking a row complete in a 
> worklist irrespective of whether it was already completed), and automatic 
> bulk updates.  However, these were exactly the classes of transactions that 
> we already ran under a lower isolation level than serializability, since they 
> have tightly constrained shapes and don't benefit much from the additional 
> guarantees.
>
> So, if this only affects transactions with blind updates, I doubt it will 
> cause much pain in real workloads (even though it might look bad in 
> benchmarks which include a mix of blind writes and rmw operations).  
> Particularly if it only happens if you explicitly opt into zheap storage.
>
Thanks Joshua for sharing your input on this. I'm not aware of any
realistic workloads for serializable transactions. So, it is really
helpful.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: In-place updates and serializable transactions

2018-11-14 Thread Kevin Grittner
On Wed, Nov 14, 2018 at 5:43 AM Joshua Yanovski
 wrote:

> This is only a personal anecdote, but from my own experience with 
> serializability, this sort of blind update isn't often contended in realistic 
> workloads.

> So, if this only affects transactions with blind updates, I doubt it will 
> cause much pain in real workloads (even though it might look bad in 
> benchmarks which include a mix of blind writes and rmw operations).  
> Particularly if it only happens if you explicitly opt into zheap storage.

I agree with all of that, but will be very interested in what
failures, if any, kick out from the "isolation" test set when all
tables are created using zheap.  I added all the common failure
patterns I had seen to that set, and other have filled in some corner
cases I missed since then, so if everything there passes I would not
worry about it at all.  If we do see some failures, we can take
another look to see whether any action is needed.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



Re: In-place updates and serializable transactions

2018-11-14 Thread Kevin Grittner
On Tue, Nov 13, 2018 at 10:45 PM Kuntal Ghosh
 wrote:

> Currently, we're working on the serializable implementations for
> zheap.

Great!

> If transaction T1 reads a row version (thus acquiring a predicate lock
> on it) and a second transaction T2 updates that row version (thus
> creating a rw-conflict graph edge from T1 to T2), must a third
> transaction T3 which re-updates the new version of the row also have a
> rw-conflict in from T1 to prevent anomalies?  In other words,  does it
> matter whether we recognize the edge T1 --rw--> T3?

No.  Keep in mind that extensive study has shown that snapshot
isolation can only be non-serializable if there is a cycle in the
apparent order of execution and that this can only occur if there is a
"dangerous structure" of two adjacent read-write antidependencies
(a/k/a read-write dependencies, a/k/a rw-conflicts) *AND* the
transaction you identify as T3 in that structure *IS THE FIRST
TRANSACTION IN THE CYCLE TO COMMIT*.  Looking at the implied T1/T3
relationship and looking for a T4 to complete the structure is not
necessary, because there are proofs that three *ADJACENT* transactions
are necessary for a serialization anomaly to be seen.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/



Re: In-place updates and serializable transactions

2018-11-14 Thread Joshua Yanovski
This is only a personal anecdote, but from my own experience with
serializability, this sort of blind update isn't often contended in
realistic workloads.  The reason is that (again, IME), most blind writes
are either insertions, or "read-writes in disguise" (the client read an old
value in a different transaction); in the latter case, the data in question
are often logically "owned" by the client, and will therefore rarely be
contended.  I think there are two major exceptions to this: transactions
that perform certain kinds of monotonic updates (for instance, marking a
row complete in a worklist irrespective of whether it was already
completed), and automatic bulk updates.  However, these were exactly the
classes of transactions that we already ran under a lower isolation level
than serializability, since they have tightly constrained shapes and don't
benefit much from the additional guarantees.

So, if this only affects transactions with blind updates, I doubt it will
cause much pain in real workloads (even though it might look bad in
benchmarks which include a mix of blind writes and rmw operations).
Particularly if it only happens if you explicitly opt into zheap storage.

On Wed, Nov 14, 2018 at 5:46 AM Kuntal Ghosh 
wrote:

> In brief, due to in-place updates, in some cases, the false positives
> may increase for serializable transactions. Any thoughts?
>
> [1] src/backend/storage/lmgr/README-SSI
> [2] src/test/isolation/specs/multiple-row-versions.spec
> --
> Thanks & Regards,
> Kuntal Ghosh
> EnterpriseDB: http://www.enterprisedb.com
>
>


In-place updates and serializable transactions

2018-11-13 Thread Kuntal Ghosh
Hello hackers,

Currently, we're working on the serializable implementations for
zheap. As mentioned in README-SSI documentation[1], there is one
difference in SSI implementation of PostgreSQL that can differentiate
the conflict detection behaviour with other storage engines that
supports updates in-place.

The heap storage in PostgreSQL does not use "update in place" with a
rollback log for its MVCC implementation.  Where possible it uses
"HOT" updates on the same page (if there is room and no indexed value
is changed). For non-HOT updates the old tuple is expired in place and
a new tuple is inserted at a new location.  Because of this
difference, a tuple lock in PostgreSQL doesn't automatically lock any
other versions of a row. We can take the following example from the
doc to understand the situation in more detail:

T1 ---rw---> T2 ---ww--->T3

If transaction T1 reads a row version (thus acquiring a predicate lock
on it) and a second transaction T2 updates that row version (thus
creating a rw-conflict graph edge from T1 to T2), must a third
transaction T3 which re-updates the new version of the row also have a
rw-conflict in from T1 to prevent anomalies?  In other words,  does it
matter whether we recognize the edge T1 --rw--> T3? The document also
includes a nice proof for why we don't try to copy or expand a tuple
lock to any other versions of the row or why we don't have to
explicitly recognize the edge T1 --rw--> T3.

In PostgreSQL, the predicate locking is implemented using the tuple
id. In zheap, since we perform updates in-place, we don't change the
tuple id. So, in the above example, we easily recognize the edge
T1--rw--> T3. This may increase the number of false positives for
certain cases. In the above example, if we introduce another
transaction T4 such that T3 --rw--> T4 and T4 gets committed first,
for zheap, T3 will be rolled back because of the dangerous structure
T1 --rw--> T3 --rw--> T4. But, for heap, T3 can be committed(isolation
test case [2]). IMHO, this seems to be an acceptable behavior.

In brief, due to in-place updates, in some cases, the false positives
may increase for serializable transactions. Any thoughts?

[1] src/backend/storage/lmgr/README-SSI
[2] src/test/isolation/specs/multiple-row-versions.spec
-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com