Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-26 Thread Michael Paquier
On Thu, Apr 27, 2017 at 2:25 AM, Tom Lane wrote: > Simon Riggs writes: >> I've added code following Michael and Tom's comments to the previous >> patch. New patch attached. > > Couple of minor suggestions: > > * Rather than deleting the comment for SubTransSetParent entirely, > maybe make it say

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-26 Thread Tom Lane
Simon Riggs writes: > I've added code following Michael and Tom's comments to the previous > patch. New patch attached. Couple of minor suggestions: * Rather than deleting the comment for SubTransSetParent entirely, maybe make it say "It's possible that the parent was already recorded. However,

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-26 Thread Simon Riggs
On 26 April 2017 at 15:28, Tom Lane wrote: > Nikhil Sontakke writes: >> A SELECT query on the newly promoted master on any of the tables involved >> in the 2PC hangs. The hang is due to a loop in >> SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a >> circular reference in pare

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-26 Thread Tom Lane
Nikhil Sontakke writes: > A SELECT query on the newly promoted master on any of the tables involved > in the 2PC hangs. The hang is due to a loop in > SubTransGetTopmostTransaction(). Due to incorrect linkages, we get a > circular reference in parentxid <-> subxid inducing the infinite loop. I'm

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-26 Thread Nikhil Sontakke
> I'm suggesting we take the approach that if there is a problem we can > recreate it as a way of exploring what conditions are required and > therefore work out the impact. Nikhil Sontakke appears to have > re-created something, but not quite what I had expected. I think he > will post here tomorr

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-25 Thread Michael Paquier
On Wed, Apr 26, 2017 at 4:53 AM, Simon Riggs wrote: > On 25 April 2017 at 16:28, Tom Lane wrote: >> Simon Riggs writes: >>> I can't see any reason now why overwriteOK should exist at all. I'm >>> guessing that the whole "overwriteOK" idea was an incorrect response >>> to xids appearing where the

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-25 Thread Andres Freund
On 2017-04-25 21:22:44 +0100, Simon Riggs wrote: > On 24 April 2017 at 19:59, Andres Freund wrote: > > > I don't think that's generally true. > > If you think that, from a risk perspective it is enough for me to > continue to investigate and I have been doing that. I'm not sure it's worth diggi

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-25 Thread Simon Riggs
On 24 April 2017 at 19:59, Andres Freund wrote: > I don't think that's generally true. If you think that, from a risk perspective it is enough for me to continue to investigate and I have been doing that. As I said before I thought I had found a problem. I'm suggesting we take the approach tha

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-25 Thread Simon Riggs
On 25 April 2017 at 16:28, Tom Lane wrote: > Simon Riggs writes: >> I can't see any reason now why overwriteOK should exist at all. I'm >> guessing that the whole "overwriteOK" idea was an incorrect response >> to xids appearing where they shouldn't have done because of the >> mistake you just co

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-25 Thread Tom Lane
Simon Riggs writes: > I can't see any reason now why overwriteOK should exist at all. I'm > guessing that the whole "overwriteOK" idea was an incorrect response > to xids appearing where they shouldn't have done because of the > mistake you just corrected. So I will now remove the parameter from >

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-24 Thread Simon Riggs
On 23 April 2017 at 17:17, Tom Lane wrote: > Simon Riggs writes: >>> Also, when I fix that, it gets further but still crashes at the same >>> Assert in SubTransSetParent. The proximate cause this time seems to be >>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong: >>> it'

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-24 Thread Andres Freund
On 2017-04-24 13:29:11 +0100, Simon Riggs wrote: > On 24 April 2017 at 00:25, Andres Freund wrote: > > if the subxid->xid mapping doesn't actually exist - as it's the case > > with this bug afaics - we'll not get the correct toplevel > > transaction. > > The nature of the corruption is that in so

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-24 Thread Nikhil Sontakke
> > Also, when I fix that, it gets further but still crashes at the same > Assert in SubTransSetParent. The proximate cause this time seems to > be > that RecoverPreparedTransactions's calculation of overwriteOK is > wrong: > it's computing that as "false", but in reality the s

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-24 Thread Simon Riggs
On 24 April 2017 at 00:25, Andres Freund wrote: >> >> It's not clear to me how much potential this has to create user data >> >> corruption, but it doesn't look good at first glance. Discuss. >> > >> > Hm. I think it can cause wrong tqual.c results in some edge cases. >> > During HS, lastOverflo

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Andres Freund
Hi, On 2017-04-23 11:05:44 +0100, Simon Riggs wrote: > > Yikes. This is clearly way undertested. It's also pretty scary that > > the code has recently been whacked out quite heavily (both 9.6 and > > master), without hitting anything around this - doesn't seem to bode > > well for how in-depth t

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Simon Riggs
On 23 April 2017 at 18:41, Simon Riggs wrote: > On 23 April 2017 at 17:17, Tom Lane wrote: >> Simon Riggs writes: Also, when I fix that, it gets further but still crashes at the same Assert in SubTransSetParent. The proximate cause this time seems to be that RecoverPreparedTransa

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Simon Riggs
On 23 April 2017 at 17:17, Tom Lane wrote: > Simon Riggs writes: >>> Also, when I fix that, it gets further but still crashes at the same >>> Assert in SubTransSetParent. The proximate cause this time seems to be >>> that RecoverPreparedTransactions's calculation of overwriteOK is wrong: >>> it'

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Tom Lane
Simon Riggs writes: > On 23 April 2017 at 00:55, Tom Lane wrote: >> It's not clear to me how much potential this has to create user data >> corruption, but it doesn't look good at first glance. Discuss. > SubTransSetParent() only matters for the case where we are half-way > through a commit wit

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Tom Lane
Simon Riggs writes: >> Also, when I fix that, it gets further but still crashes at the same >> Assert in SubTransSetParent. The proximate cause this time seems to be >> that RecoverPreparedTransactions's calculation of overwriteOK is wrong: >> it's computing that as "false", but in reality the su

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Simon Riggs
On 23 April 2017 at 01:19, Andres Freund wrote: > On 2017-04-22 19:55:18 -0400, Tom Lane wrote: >> Now that we've got consistent failure reports about the 009_twophase.pl >> recovery test, I set out to find out why it's failing. It looks to me >> like the reason is that this (twophase.c:2145): >>

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-23 Thread Simon Riggs
On 23 April 2017 at 00:55, Tom Lane wrote: > Now that we've got consistent failure reports about the 009_twophase.pl > recovery test, I set out to find out why it's failing. It looks to me > like the reason is that this (twophase.c:2145): > > SubTransSetParent(xid, subxid, overwriteOK

Re: [HACKERS] StandbyRecoverPreparedTransactions recovers subtrans links incorrectly

2017-04-22 Thread Andres Freund
On 2017-04-22 19:55:18 -0400, Tom Lane wrote: > Now that we've got consistent failure reports about the 009_twophase.pl > recovery test, I set out to find out why it's failing. It looks to me > like the reason is that this (twophase.c:2145): > > SubTransSetParent(xid, subxid, overwrite