Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-21 Thread Dan S
A starvation scenario is what worries me:

Lets say we have a slow complex transaction with many tables involved.
Concurrently smaller transactions begins and commits .

Wouldn't it be possible for a starvation scenario where the slower
transaction will
never run to completion but give a serialization failure over and over again
on retry ?

If I know at what sql-statement the serialization failure occurs can i then
conclude that
some of the tables in that exact query were involved in the conflict ?

If the serialization failure occurs at commit time what can I conclude then
?
They can  occur at commit time right ?

What is the likelyhood that there exists an update pattern that always give
the failure in the slow transaction ?

How would one break such a recurring pattern ?
You could maybe try to lock each table used in the slow transaction but that
would be prohibitively costly
for concurrency.
But what else if there is no way of knowing what the slow transaction
conflicts against.

As things with concurrency involved have a tendency to pop up in production
and not in test I think it is important to
start thinking about them as soon as possible.

Best Regards
Dan S


Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-21 Thread Kevin Grittner
Dan S strd...@gmail.com wrote:
 
 A starvation scenario is what worries me:
 
 Lets say we have a slow complex transaction with many tables
 involved.  Concurrently smaller transactions begins and commits .
 
 Wouldn't it be possible for a starvation scenario where the slower
 transaction will never run to completion but give a serialization
 failure over and over again on retry ?
 
At least theoretically, yes.  One of the reasons I want to try
converting the single conflict reference to a list is to make for a
better worst-case situation.  Since anomalies can only occur when
the TN transaction (convention used in earlier post) commits first,
and by definition TN has done writes, with a list of conflicts you
could make sure that some transaction which writes has successfully
committed before any transaction rolls back.  So progress with
writes would be guaranteed.  There would also be a guarantee that if
you restart a canceled transaction, it would not immediately fail
again on conflicts *with the same transactions*.  Unfortunately,
with the single field for tracking conflicts, the self-reference on
multiple conflicting transactions loses detail, and you lose these
guarantees.
 
Now, could the large, long-running transaction still be the
transaction canceled?  Yes.  Are there ways to ensure it can
complete?  Yes.  Some are prettier than others.  I've already come
up with some techniques to avoid some classes of rollbacks with
transactions flagged as READ ONLY, and with the conflict lists there
would be a potential to recognize de facto read only transactions
apply similar logic, so a long-running transaction which didn't
write to any permanent tables (or at least not to ones which other
transactions were reading) would be pretty safe -- and with one of
our RD point, you could guarantee its safety by blocking the
acquisition of its snapshot until certain conditions were met.
 
With conflict lists we would also always have two candidates for
cancellation at the point where we found something needed to be
canceled.  Right now I'm taking the coward's way out and always
canceling the transaction active in the process which detects the
need to roll something back.  As long as one process can cancel
another, we can use other heuristics for that.  Several possible
techniques come to mind to try to deal with the situation you raise.
 
If all else fails, the transaction could acquire explicit table
locks up front, but that sort of defeats the purpose of having an
isolation level which guarantees full serializable behavior without
adding any blocking to snapshot isolation.  :-(
 
 If I know at what sql-statement the serialization failure occurs
 can i then conclude that some of the tables in that exact query
 were involved in the conflict ?
 
No.  It could be related to any statements which had executed in the
transaction up to that point.
 
 If the serialization failure occurs at commit time what can I
 conclude then ?
 
That a dangerous combination of read-write dependencies occurred
which involved this transaction.
 
 They can  occur at commit time right ?
 
Yes.  Depending on the heuristics chosen, it could happen while
idle in transaction.  (We can kill transactions in that state now,
right?)
 
 What is the likelyhood that there exists an update pattern that
 always give the failure in the slow transaction ?
 
I don't know how to quantify that.  I haven't seen it yet in
testing, but many of my tests so far have been rather contrived.  We
disparately need more testing of this patch with realistic
workloads.
 
 How would one break such a recurring pattern ?
 
As mentioned above, the conflict list enhancement would help ensure
that *something* is making progress.  As mentioned above, we could
tweak the heuristics on *what* gets canceled to try to deal with
this.
 
 You could maybe try to lock each table used in the slow
 transaction but that would be prohibitively costly for
 concurrency.
 
Exactly.
 
 But what else if there is no way of knowing what the slow
 transaction conflicts against.
 
Well, that is supposed to be the situation where this type of
approach is a good thing.  The trick is to get enough experience
with different loads to make sure we're using good heuristics to
deal with various loads well.  Ultimately, there may be some loads
for which this technique is just not appropriate.  Hopefully those
cases can be addressed with the techniques made possible with
Florian's patch.
 
 As things with concurrency involved have a tendency to pop up in
 production and not in test I think it is important to start
 thinking about them as soon as possible.
 
Oh, I've been thinking about it a great deal for quite a while.  The
problem is exactly as you state -- it is very hard to construct
tests which give a good idea of what the impact will be in
production loads.  I'm sure I could construct a test which would
make the patch look glorious.  I'm sure I could construct a test
which would make the patch look 

Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-21 Thread Robert Haas
On Tue, Sep 21, 2010 at 12:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 What is the likelyhood that there exists an update pattern that
 always give the failure in the slow transaction ?

 I don't know how to quantify that.  I haven't seen it yet in
 testing, but many of my tests so far have been rather contrived.  We
 disparately need more testing of this patch with realistic
 workloads.

I'm really hoping that Tom or Heikki will have a chance to take a
serious look at this patch soon with a view to committing it.  It
sounds like Kevin has done a great deal of testing on his own, but
we're not going to really get field experience with this until it's in
the tree.  It would be nice to get this in well before feature freeze
so that we have a chance to see what shakes out while there's still
time to adjust it.  Recall that Hot Standby was committed in December
and we were still adjusting the code in May.  It would be much nicer
to commit in September and finish up adjusting the code in February.
It helps get the release out on schedule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Serializable snapshot isolation error logging

2010-09-20 Thread Dan S
Hi !

I wonder if the SSI implementation will give some way of detecting the cause
of a serialization failure.
Something like the deadlock detection maybe where you get the sql-statements
involved.

Best Regards
Dan S


Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-20 Thread Kevin Grittner
Dan S strd...@gmail.com wrote:
 
 I wonder if the SSI implementation will give some way of detecting
 the cause of a serialization failure.
 Something like the deadlock detection maybe where you get the
 sql-statements involved.
 
I've been wondering what detail to try to include.  There will often
be three transactions involved in an SSI serialization failure,
although the algorithm we're using (based on the referenced papers)
may only know about one or two of them at the point of failure,
because conflicts with multiple other transactions get collapsed to
a self-reference.  (One optimization I want to try is to maintain
a list of conflicts rather than doing the above -- in which case we
could always show all three transactions; but we may run out of time
for that, and even if we don't, the decreased rollbacks might not
pay for the cost of maintaining such a list.)
 
The other information we would have would be the predicate locks
held by whatever transactions we know about at the point of
cancellation, based on what reads they've done; however, we wouldn't
know about the writes done by those transaction, or which of the
reads resulting in conflicts.
 
So, given the above, any thoughts on what we *should* show?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-20 Thread Dan S
Well I guess one would like some way to find out which statements in the
involved transactions are the cause of the serialization failure and what
programs they reside in.

Also which relations were involved, the sql-statements may contain many
relations but just one or a few might be involved in the failure, right ?

The tuples involved if available.

I don't know how helpful it would be to know the pages involved might be, I
certainly wouldn't know what to do with that info.

All this is of course to be able to guess at which statements to modify or
change execution order of, take an explicit lock on and so on to reduce
serialization failure rate.

If holding a list of the involved transactions turns out to be expensive,
maybe one should be able to turn it on by a GUC only when you have a problem
and need the extra information to track it down.

Best Regards
Dan S


2010/9/20 Kevin Grittner kevin.gritt...@wicourts.gov

 Dan S strd...@gmail.com wrote:

  I wonder if the SSI implementation will give some way of detecting
  the cause of a serialization failure.
  Something like the deadlock detection maybe where you get the
  sql-statements involved.

 I've been wondering what detail to try to include.  There will often
 be three transactions involved in an SSI serialization failure,
 although the algorithm we're using (based on the referenced papers)
 may only know about one or two of them at the point of failure,
 because conflicts with multiple other transactions get collapsed to
 a self-reference.  (One optimization I want to try is to maintain
 a list of conflicts rather than doing the above -- in which case we
 could always show all three transactions; but we may run out of time
 for that, and even if we don't, the decreased rollbacks might not
 pay for the cost of maintaining such a list.)

 The other information we would have would be the predicate locks
 held by whatever transactions we know about at the point of
 cancellation, based on what reads they've done; however, we wouldn't
 know about the writes done by those transaction, or which of the
 reads resulting in conflicts.

 So, given the above, any thoughts on what we *should* show?

 -Kevin



Re: [HACKERS] Serializable snapshot isolation error logging

2010-09-20 Thread Kevin Grittner
Dan S strd...@gmail.com wrote:
 
 Well I guess one would like some way to find out which statements
 in the involved transactions are the cause of the serialization
 failure and what programs they reside in.
 
Unless we get the conflict list optimization added after the base
patch, you might get anywhere from one to three of the two to three
transactions involved in the serialization failure.  We can also
report the position they have in the dangerous structure and
mention that there are other, unidentified, transactions
participating in the conflict.  Once I get through with the issue
I'm working on based on Heikki's observations, I'll take a look at
this.
 
 Also which relations were involved, the sql-statements may contain
 many relations but just one or a few might be involved in the
 failure, right ?
 
The conflicts would have occurred on specific relations, but we
don't store all that -- it would be prohibitively expensive.  What
we track is that transaction T0's read couldn't see the write from
transaction T1.  Once you know that, SSI doesn't require that you
know which or how many relations were involved in that -- you've
established that T0 must logically come before T1.  That in itself
is no problem, of course.  But if you also establish that T1 must
come before TN (where TN might be T0 or a third transaction), you've
got a pivot at T1.  You're still not dead in the water yet, but if
that third logical transaction actually *commits* first, you're
probably in trouble.  The only way out is that if T0 is not TN, T0
is read only, and TN did *not* commit before T0 got its snapshot,
you're OK.
 
Where it gets complicated is that in the algorithm in the paper,
which we are following for the initial commit attempt, each
transaction keeps one conflictIn and one conflictOut pointer
for checking all this.  If you already have a conflict with one
transaction and then detect a conflict of the same type with
another, you change the conflict pointer to a self-reference --
which means you conflict with *all* other concurrent transactions
in that direction.  You also have lost the ability to report all
transaction which are involved in the conflict.
 
 The tuples involved if available.
 
 I don't know how helpful it would be to know the pages involved
 might be, I certainly wouldn't know what to do with that info.
 
That information would only be available on the *read* side.  We
count on MVCC data on the *write* side, and I'm not aware of any way
for a transaction to list everything it's written.  Since we're not
recording the particular points of conflict between transactions,
there's probably not a lot of point in listing it anyway -- there
might be a conflict on any number of tuples out of a great many read
or written.
 
 All this is of course to be able to guess at which statements to
 modify or change execution order of, take an explicit lock on and
 so on to reduce serialization failure rate.
 
I understand the motivation, but the best this technique is likely
to be able to provide is the transactions involved, and that's not
always going to be complete unless we convert those single-
transaction conflict fields to lists.
 
 If holding a list of the involved transactions turns out to be
 expensive, maybe one should be able to turn it on by a GUC only
 when you have a problem and need the extra information to track it
 down.
 
That might be doable.  If we're going to add such a GUC, though, it
should probably be considered a tuning GUC, with the list setting
recommended for debugging problems.  Of course, if you change it
from field to list the problem might disappear.  Hmmm.  Unless
we also had a debug setting which kept track of the list but
ignored it for purposes of detecting the dangerous structures
described above.
 
Of course, you will always know what transaction was canceled.
That does give you something to look at.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers