On Fri, Feb 23, 2018 at 6:05 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote:> >> The best solution I have come up with so far is to add a reference >> count to SERIALIZABLEXACT. I toyed with putting the refcount into the >> DSM instead, but then I ran into problems making that work when you >> have a query with multiple Gather nodes. Since the refcount is in >> SERIALIZABLEXACT I also had to add a generation counter so that I >> could detect the case where you try to attach too late (the leader has >> already errored out, the refcount has reached 0 and the >> SERIALIZABLEXACT object has been recycled). > > I don't know whether that's safe or not. It certainly sounds like > it's solving one category of problem, but is that the only issue? If > some backends haven't noticed that we're safe, they might keep > acquiring SIREAD locks or doing other manipulations of shared state, > which maybe could cause confusion. I haven't looked into this deeply > enough to understand whether there's actually a possibility of trouble > there, but I can't rule it out off-hand.
After some testing, I think the refcount approach could be made to work, but it seems quite complicated and there are some weird edge cases that showed up that started to make it look like more trouble than it was worth. One downside of refcounts is that you never get to free the SERIALIZABLEXACT until the end of the transaction with parallel_leader_participation = off. I'm testing another version that is a lot simpler: like v10, it relies on the knowledge that the leader's transaction will always end after the workers have finished, but it handles the RO_SAFE optimisation by keeping the SERIALIZABLEXACT alive but freeing its locks etc. More soon.