On Tue, Dec 13, 2016 at 1:00 PM, Ian Jackson <ian.jack...@eu.citrix.com> wrote: > The conversion is as follows: if a scenario is affected by the caveat, > in there must be at least one transaction T which firstly produces > "impossible" results I, and in which some later statement S produces a > serialisation failure. > > To exhibit the corresponding unavoidable bug: Execute an identical > scenario, with exactly the same sequence of steps in the same order, > up to S. However, instead of S, execute ROLLBACK.
I am having a hard time understanding exactly what the argument on this thread is about, but I want to comment on this point. Saying that a set of transactions are serializable is equivalent to the statement that there is some serial order of execution which would have produced results equivalent to the actual results. That is, there must be at least one schedule (T1, ..., Tn) such that running the transactions one after another in that order would have produced the same results that were obtained by running them concurrently. Any transactions that roll back whether due to serialization anomalies or manual user intervention or any other reason are not part of the schedule, which considers only successfully committed transactions. The rolled-back transactions effectively didn't happen, and serializability as a concept makes no claim about the behavior of such transactions. That's not a PostgreSQL thing: that's database theory. However, in practice, the scenario that you mention should generally work fine in PostgreSQL, I think. If T performed any writes, the rollback throws them away, so imagine removing the actual T from the mix and replacing it with a substitute version T' which performs the same reads but no writes and then tries to COMMIT where T tried to ROLLBACK. T' will succeed, because we never roll back a read-only transaction at commit time. If it were to fail, it would have to fail *while performing one of the reads*, not later. But imagine a hypothetical database system in which anomalies are never detected until commit time. We somehow track the global must-happen-before graph and refuse the commit of any transaction which will create a cycle. Let's also suppose that this system uses snapshots to implement MVCC. In such a system, read-only transactions will sometimes fail at commit time if they've seen a view of the world that is inconsistent with the only remaining possible serial schedules. For example, suppose T1 updates A -> A' and reads B. Concurrently, T2 updates B -> B'; thus, T1 must precede T2. Next, T2 commits. Now, T3 begins and reads B', so that T2 must precede T3. Next T1 commits. T3, which took its snapshot before the commit of T1, now reads A. Thus T3 has to proceed T1. That's a cycle, so T3 won't be allowed to commit, but yet it's done a couple of reads and hasn't failed yet... because of an implementation detail of the system. That's probably annoying from a user perspective -- if a transaction is certainly going to fail we'd like to report the failure as early as possible -- and it's probably crushingly slow, but according to my understanding it's unarguably a correct implementation of serializability (assuming there are no bugs), yet it doesn't deliver the guarantee you're asking for here. I have not read any database literature on the interaction of serializability with subtransactions. This seems very thorny. Suppose T1 reads A and B and updates A -> A' while concurrently T2 reads A and B and updates B -> B'. This is obviously not serializable; if either transaction had executed before the other in a serial schedule, the second transaction in the schedule would have had to have seen (A, B') or (A', B) rather than (A, B), but that's not what happened. But what if each of T1 and T2 did the reads in a subtransaction, rolled it back, and then did the write in the main transaction and committed? The database system has two options. First, it could assume that the toplevel transaction may have relied on the results of the aborted subtransaction. But if it does that, then any serialization failure which afflicts a subtransaction must necessarily also kill the main transaction, which seems pedantic and unhelpful. If you'd wanted the toplevel transaction to be killled, you wouldn't have used a subtransaction, right? Second, it could assume that the toplevel transaction in no way relied on or used the values obtained from the aborted subtransaction. However, that defeats the whole purpose of having subtransactions in the first place. What's the point of being able to start subtransactions if you can't roll them back and then decide to do something else instead? It seems to me that what the database system should do is make that second assumption, and what the user should do is understand that to the degree that transactions depend on the results of aborted subtransactions, there may be serialization anomalies that go undetected. And the user should put up with that because the cure is worse than the disease. Maybe there's a more formally satisfying answer than that but I'm not really seeing it... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers