Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

Álvaro Hernández Tortosa Sat, 08 Nov 2014 12:49:18 -0800


On 06/11/14 15:00, Kevin Grittner wrote:

Álvaro Hernández Tortosa <a...@8kdata.com> wrote:

     There has been two comments which seem to state that changing this
may introduce some performance problems and some limitations when you
need to take out some locks. I still believe, however, that current
behavior is confusing for the user. Sure, one option is to patch the
documentation, as I was suggesting.

Yeah, I thought that's what we were talking about, and in that
regard I agree that the docs could be more clear.  I'm not quite
sure what to say where to fix that, but I can see how someone could
be confused and have the expectation that once they have run BEGIN
TRANSACTION ISOLATION LEVEL SERIALIZABLE the transaction will not
see the work of transactions committing after that.  The fact that
this is possible is implied, if one reads carefully and thinks
about it, by the statement right near the start of the "Transaction
Isolation" section which says "any concurrent execution of a set of
Serializable transactions is guaranteed to produce the same effect
as running them one at a time in some order."  As Robert pointed
out, this is not necessarily the commit order or the transaction
start order.

It is entirely possible that if you have serializable transactions
T1 and T2, where T1 executes BEGIN first (and even runs a query
before T2 executes BEGIN) and T1 commits first, that T2 will
"appear" to have run first because it will look at a set of data
which T1 modifies and not see the changes.  If T1 were to *also*
look at a set of data which T2 modifies, then one of the
transactions would be rolled back with a serialization failure, to
prevent a cycle in the apparent order of execution; so the
requirements of the standard (and of most software which is
attempting to handle race conditions) is satisfied.  For many
popular benchmarks (and I suspect most common workloads) this
provides the necessary protections with better performance than is
possible using blocking to provide the required guarantees.[1]

Yes, you're right in that the "any concurrent execution..." phraseimplicitly means that snapshot may not be taken at BEGIN or SETTRANSACTION time, but it's definitely not clear enough for the averageuser. Yet this may apply to the serializable case, but it doesn't to therepeatable read where the docs read " The Repeatable Read isolationlevel only sees data committed before the transaction began; it neversees either uncommitted data or changes committed during transactionexecution by concurrent transactions". The first part is confusing, aswe discussed; the second part is even more confusing as it says "duringtransaction execution", and isn't the transaction -not the snapshot-beginning at BEGIN time?

Surprisingly, the language is way more clear in the SET TRANSACTIONdoc page [2].


At any rate, the language in that section is a little fuzzy on the
concept of the "start of the transaction."  Perhaps it would be
enough to change language like:

| sees a snapshot as of the start of the transaction, not as of the
| start of the current query within the transaction.

to:

| sees a snapshot as of the start of the first query within the
| transaction, not as of the start of the current query within the
| transaction.

Would that have prevented the confusion here?

I think that definitely helps. But it may be better to make it evenmore clear, more explicit. And offering a solution for the user who maylike the snapshot to be taken "at begin time", like suggesting to do a"SELECT 1" query.

     But what about creating a flag to BEGIN and SET TRANSACTION
commands, called "IMMEDIATE FREEZE" (or something similar), which
applies only to REPEATABLE READ and SERIALIZABLE? If this flag is set
(and may be off by default, but of course the default may be
configurable via a guc parameter), freeze happens when it is present
(BEGIN or SET TRANSACTION) time. This would be a backwards-compatible
change, while would provide the option of freezing without the nasty
hack of having to do a "SELECT 1" prior to your real queries, and
everything will of course be well documented.

What is the use case where you are having a problem?  This seems
like an odd solution, so it would be helpful to know what problem
it is attempting to solve.

I don't have a particular use case. I just came across the issueand thought the documentation and behavior wasn't consistent. So thefirst aim is not to have users surprised (in a bad way). But I see aclear use case: users who might want to open a (repeatable read |serializable) transaction to have their view of the database frozen, toperform any later operation on that frozen view. Sure, that comes at apenalty, but I see that potentially interesting too.


    Regards,

    Álvaro



[1] http://www.postgresql.org/docs/9.4/static/transaction-iso.html
[2] http://www.postgresql.org/docs/9.4/static/sql-set-transaction.html

--
Álvaro Hernández Tortosa


-----------
8Kdata

Re: [HACKERS] Repeatable read and serializable transactions see data committed after tx start

Reply via email to