Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:53 PM, Andres Freund wrote: > Primarily because it's not an anti-corruption tool. I'd be surprised if > there weren't ways to corrupt the page using these corruptions that > aren't detected by it. It's very hard to assess the risk of missing something

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-09 16:45:07 -0800, Peter Geoghegan wrote: > On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund wrote: > >> Actually, on second thought, I take that back -- I don't think that > >> REINDEXing will even finish once a HOT chain is broken by the bug. > >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund wrote: >> Actually, on second thought, I take that back -- I don't think that >> REINDEXing will even finish once a HOT chain is broken by the bug. >> IndexBuildHeapScan() actually does quite a good job of making sure >> that HOT

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund wrote: >> I don't follow you here. Why would REINDEXing make the rows that >> should be dead disappear again, even for a short period of time? > > It's not the REINDEX that makes them reappear. Of course. I was just trying to make

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-09 16:02:17 -0800, Peter Geoghegan wrote: > > What I'm currently wondering about is how much we need to harden > > postgres against such existing corruption. If e.g. the hot chains are > > broken somebody might have reindexed thinking the problem is fixed - but > > if they then later

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund wrote: > Attached is a version of the already existing regression test that both > reproduces the broken hot chain (and thus failing index lookups) and > then also the tuple reviving. I don't see any need for letting this run >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-04 06:15:00 -0700, Andres Freund wrote: > The reason for that is that I hadn't yet quite figured out how the bug I > described in the commit message (and the previously committed testcase) > would cause that. I was planning to diagnose / experiment more with this > and write an email if

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-04 Thread Andres Freund
On 2017-11-04 06:15:00 -0700, Andres Freund wrote: > The current testcase, and I think the descriptions in the relevant > threads, all actually fail to point out the precise way the bug is > triggered. As e.g. evidenced that the freeze-the-dead case appears to > not cause any failures in

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-04 Thread Andres Freund
On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote: > Andres Freund wrote: > > Here's that patch. I've stared at this some, and Robert did too. Robert > > mentioned that the commit message might need some polish and I'm not > > 100% sure about the error message texts yet. >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Andres Freund
On November 4, 2017 1:22:04 AM GMT+05:30, Alvaro Herrera wrote: >Peter Geoghegan wrote: >> Andres Freund wrote: > >> > Staring at the vacuumlazy hunk I think I might have found a related >bug: >> > heap_update_tuple() just copies the old xmax to

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Peter Geoghegan
Alvaro Herrera wrote: He means that the tuple that heap_update moves to page 1 (which will no longer be processed by vacuum) will contain a multixact that's older than relminmxid -- because it is copied unchanged by heap_update instead of properly checking against age

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Alvaro Herrera
Peter Geoghegan wrote: > Andres Freund wrote: > > Staring at the vacuumlazy hunk I think I might have found a related bug: > > heap_update_tuple() just copies the old xmax to the new tuple's xmax if > > a multixact and still running. It does so without verifying liveliness >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Peter Geoghegan
Andres Freund wrote: Here's that patch. I've stared at this some, and Robert did too. Robert mentioned that the commit message might need some polish and I'm not 100% sure about the error message texts yet. The commit message should probably say that the bug involves the

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Andres Freund
On 2017-11-02 06:05:51 -0700, Andres Freund wrote: > Hi, > > On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote: > > Andres Freund wrote: > > > I think the problem is on the pruning, rather than the freezing side. We > > > can't freeze a tuple if it has an alive predecessor - rather than > > >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan wrote: > On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas wrote: >> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where >> I think things get a lot more dangerous. The problem (as Andres >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Peter Geoghegan
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas wrote: > The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where > I think things get a lot more dangerous. The problem (as Andres > pointed out to me this afternoon) is that it seems possible to end up > with a

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Peter Geoghegan
On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund wrote: > I think the problem is on the pruning, rather than the freezing side. We > can't freeze a tuple if it has an alive predecessor - rather than > weakining this, we should be fixing the pruning to not have the alive >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera wrote: > Pushed the reverts. > > I noticed while doing so that REL_10_STABLE contains the bogus commits. > Does that change our opinion regarding what to do for people upgrading > to a version containing the broken commits?

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Alvaro Herrera
Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Andres Freund writes: > > > Do we care about people upgrading to unreleased versions? We could do > > > nothing, document it in the release notes, or ??? > > > > Do nothing. > > Agreed. Not much we can do

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Andres Freund writes: > > Do we care about people upgrading to unreleased versions? We could do > > nothing, document it in the release notes, or ??? > > Do nothing. Agreed. Not much we can do there. Thanks! Stephen signature.asc

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Tom Lane
Andres Freund writes: > Do we care about people upgrading to unreleased versions? We could do > nothing, document it in the release notes, or ??? Do nothing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Andres Freund
Hi, On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote: > Andres Freund wrote: > > I think the problem is on the pruning, rather than the freezing side. We > > can't freeze a tuple if it has an alive predecessor - rather than > > weakining this, we should be fixing the pruning to not have the

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Alvaro Herrera
Andres Freund wrote: > I spent some time discussing this with Robert today (with both of us > alternating between feeling the other and ourselves as stupid), and the > conclusion I think is that the problem is on the pruning, rather than > the freezing side. Thanks both for spending some more

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund wrote: > I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in > the face of previously corrupted tuple chains and pg_upgraded clusters - > it can lead to tuples being considered related, even though they they're >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Andres Freund
Hi, On 2017-09-28 14:47:53 +, Alvaro Herrera wrote: > Fix freezing of a dead HOT-updated tuple > > Vacuum calls page-level HOT prune to remove dead HOT tuples before doing > liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But > concurrent transaction commit/abort may turn

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier
On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan wrote: > I found one glitch with our merge of the original dup row fix. With that > corrected AND Alvaro’s Friday fix things are solid. > No dup’s. No index corruption. > > Thanks so much. Nice to hear that! You guys seem to be

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Wood, Dan
I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid. No dup’s. No index corruption. Thanks so much. On 10/10/17, 7:25 PM, "Michael Paquier" wrote: On Tue, Oct 10, 2017 at 11:14 PM, Alvaro

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera wrote: > I was seeing just the reindex problem. I don't see any more dups. > > But I've tried to reproduce it afresh now, and let it run for a long > time and nothing happened. Maybe I made a mistake last week and > ran

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Alvaro Herrera
Wood, Dan wrote: > I’m unclear on what is being repro’d in 9.6. Are you getting the > duplicate rows problem or just the reindex problem? Are you testing > with asserts enabled(I’m not)? I was seeing just the reindex problem. I don't see any more dups. But I've tried to reproduce it afresh

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-09 Thread Michael Paquier
On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan wrote: > On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera > wrote: >> Hmm, I think I added a random sleep (max. 100ms) right after the >> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-09 Thread Wood, Dan
I’m unclear on what is being repro’d in 9.6. Are you getting the duplicate rows problem or just the reindex problem? Are you testing with asserts enabled(I’m not)? If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi by update

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-08 Thread Peter Geoghegan
On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera wrote: > Hmm, I think I added a random sleep (max. 100ms) right after the > HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that > makes the race easier to hit. I still cannot reproduce. Perhaps you can

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Alvaro Herrera
Peter Geoghegan wrote: > On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera > wrote: > >> As you must have seen, Alvaro said he has a variant of Dan's original > >> script that demonstrates that a problem remains, at least on 9.6+, > >> even with today's fix. I think it's

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Peter Geoghegan
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera wrote: >> As you must have seen, Alvaro said he has a variant of Dan's original >> script that demonstrates that a problem remains, at least on 9.6+, >> even with today's fix. I think it's the stress-test that plays with >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Alvaro Herrera
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen wrote: > > Yesterday, I've been spending time with pg_visibility on the pages when I > > reproduce the issue in 9.6. > > None of the all-frozen or all-visible bits are necessarily set in > > problematic

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen wrote: > Yesterday, I've been spending time with pg_visibility on the pages when I > reproduce the issue in 9.6. > None of the all-frozen or all-visible bits are necessarily set in problematic > pages. Since this happened

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan wrote: >> I don't know if it's really the freeze map at fault or something else. > > Ideally, it would be possible to effectively disable the new freeze > map stuff in a minimal way, for testing purposes. Perhaps the authors > of that

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera wrote: > I can tell that, in 9.6, REINDEX still reports the error we saw in > earlier releases, after some of the runs of my reproducer scripts. I'm > unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera wrote: > By the way, I still wonder if there's any way for a new tuple to get > inserted in the place where a HOT redirect would be pointing to, and > have it be marked as Frozen, where the old redirect contains a >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Peter Geoghegan wrote: > On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera > wrote: > > Wood, Dan wrote: > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > >> > >> I would prefer to focus on either latest 9X or 11dev. > > > > I tested my patch on 9.4 and 9.5

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera wrote: > Wood, Dan wrote: >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. >> >> I would prefer to focus on either latest 9X or 11dev. > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
By the way, I still wonder if there's any way for a new tuple to get inserted in the place where a HOT redirect would be pointing to, and have it be marked as Frozen, where the old redirect contains a non-invalid Xmax. I tried to think of a way for that to happen, but couldn't think of anything.

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Stephen Frost
Robert, * Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote: > Michael Paquier wrote: > > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera > > wrote: > > > Wood, Dan wrote: > > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > > >> > > >> I would prefer to

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera wrote: > Ah, thanks for the tip. I hope the authors of that can do the gruntwork > of researching this problem there, then. I have some stuff using 9.6 extensively, so like Dan I think I'll chime in anyway. Not before

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote: > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera > wrote: > > Wood, Dan wrote: > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > >> > >> I would prefer to focus on either latest 9X or 11dev. > > > > I tested my patch on 9.4 and

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera wrote: > Wood, Dan wrote: >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. >> >> I would prefer to focus on either latest 9X or 11dev. > > I tested my patch on 9.4 and 9.5 today and it seems to close the

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera wrote: > Michael Paquier wrote: >> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera >> wrote: >> +/* >> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, >> + * taking into

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Wood, Dan wrote: > Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. > > I would prefer to focus on either latest 9X or 11dev. I tested my patch on 9.4 and 9.5 today and it seems to close the problem (with the patch, I waited 10x as many iterations as it took for the problem to occur

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote: > On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera > wrote: > + /* > +* If the xmax of the old tuple is identical to the xmin of the new one, > +* it's a match. > +*/ > + if (xmax == xmin) > + return true; > I would use

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera wrote: > I think this is the patch for 9.3. I ran the test a few hundred times > (with some additional changes such as randomly having an update inside a > savepoint that's randomly aborted, randomly aborting the

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Wood, Dan
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today. I would prefer to focus on either latest 9X or 11dev. Does Alvaro’s patch presume any of the other patch to set COMMITTED in the freeze code? On 10/4/17, 7:17 PM, "Michael Paquier" wrote: On Thu, Oct

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Peter Geoghegan
On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera wrote: > At any rate, I was thinking in a new routine to encapsulate the logic, > > /* > * Check the tuple XMIN against prior XMAX, if any > */ > if

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
I think this is the patch for 9.3. I ran the test a few hundred times (with some additional changes such as randomly having an update inside a savepoint that's randomly aborted, randomly aborting the transaction, randomly skipping the for key share lock, randomly sleeping at a few points; and

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
Peter Geoghegan wrote: > As you know, on version 9.4+, as of commit 37484ad2a, we decided that > we are "largely ignoring the value to which it [xmin] is set". The > expectation became that raw xmin is available after freezing, but > mostly for forensic purposes. I think Alvaro should now

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
Wood, Dan wrote: > Whatever you do make sure to also test 250 clients running lock.sql. Even > with the communities fix plus YiWen’s fix I can still get duplicate rows. > What works for “in-block” hot chains may not work when spanning blocks. Good idea. You can achieve a similar effect by

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan wrote: > Whatever you do make sure to also test 250 clients running lock.sql. Even > with the communities fix plus YiWen’s fix I can still get duplicate rows. > What works for “in-block” hot chains may not work when spanning

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Wood, Dan
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks. Once nearly all 250 clients have done their updates and everybody is

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera wrote: > Wong, Yi Wen wrote: >> My interpretation of README.HOT is the check is just to ensure the chain is >> continuous; in which case the condition should be: >> >> > if (TransactionIdIsValid(priorXmax)

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera wrote: > Wong, Yi Wen wrote: >> My interpretation of README.HOT is the check is just to ensure the chain is >> continuous; in which case the condition should be: >> >> > if (TransactionIdIsValid(priorXmax) &&

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan wrote: > The early “break;” here is likely the xmin frozen reason as I found in the > other loop. It looks that way. Since we're already very defensive when it comes to this xmin/xmax matching business, and we're defensive while

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen wrote: > My interpretation of README.HOT is the check is just to ensure the chain is > continuous; in which case the condition should be: > >> if (TransactionIdIsValid(priorXmax) && >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wong, Yi Wen wrote: > My interpretation of README.HOT is the check is just to ensure the chain is > continuous; in which case the condition should be: > > > if (TransactionIdIsValid(priorXmax) && > > !TransactionIdEquals(priorXmax, > >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Alvaro Herrera wrote: > Peter Geoghegan wrote: > > > I thought that we no longer store FrozenTransactionId (xid 2) as our > > "raw" xmin while freezing, and yet that's what we see here. > > I'm doing this in 9.3. I can't tell if the new tuple freezing stuff > broke things more thoroughly, but

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wood, Dan wrote: > One minor side note… Is it weird for xmin/xmax to go backwards in a hot row > chain? > > lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax > ++++-++ > 1 | (0,1) | 8152 | 2818 |

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wood, Dan wrote: > There is a tangled web of issues here. With the community fix we get a > corrupted page(invalid redirect ptr from indexed item). The cause of that is: > pruneheap.c: > > /* >* Check the tuple XMIN against prior XMAX, if any >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Peter Geoghegan wrote: > I thought that we no longer store FrozenTransactionId (xid 2) as our > "raw" xmin while freezing, and yet that's what we see here. I'm doing this in 9.3. I can't tell if the new tuple freezing stuff broke things more thoroughly, but it is already broken in earlier

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain? lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax ++++-++ 1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0 2 |

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan wrote: > I’ve just started looking at this again after a few weeks break. > if (TransactionIdIsValid(priorXmax) && > !TransactionIdEquals(priorXmax, > HeapTupleHeaderGetXmin(htup))) >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier wrote: >> FYI, the repro case page contents looks like this with the patch applied: >> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid, >> to_hex(t_infomask) as infomask, >> to_hex(t_infomask2) as infomask2 >> from

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
I’ve just started looking at this again after a few weeks break. There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is: pruneheap.c: /* * Check the tuple XMIN

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Michael Paquier
On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan wrote: > I now think that it actually is a VACUUM problem, specifically a > problem with VACUUM pruning. You see the HOT xmin-to-xmax check > pattern that you mentioned within heap_prune_chain(), which looks like > where the incorrect

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera wrote: > But that still doesn't fix the problem; > as far as I can see, vacuum removes the root of the chain, not yet sure > why, and then things are just as corrupted as before. Are you sure it's not opportunistic pruning?

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera wrote: > which shows a HOT-update chain, where the t_xmax are multixacts. Then a > vacuum freeze comes, and because the multixacts are below the freeze > horizon for multixacts, we get this: > > select lp, lp_flags, t_xmin,

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Alvaro Herrera
So here's my attempt at an explanation for what is going on. At one point, we have this: select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask, to_hex(t_infomask2) as infomask2 from heap_page_items(get_raw_page('t', 0)); lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-01 Thread Alvaro Herrera
Michael Paquier wrote: > On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan wrote: > > On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera > > wrote: > >> Maybe what this means is that we need to do both Dan's initially > >> proposed patch (or something related

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-01 Thread Michael Paquier
On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan wrote: > On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera > wrote: >> Maybe what this means is that we need to do both Dan's initially >> proposed patch (or something related to it) apart from the fixes

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-30 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera wrote: > Maybe what this means is that we need to do both Dan's initially > proposed patch (or something related to it) apart from the fixes already > pushed. IOW we need to put back some of the "tupkeep" business ... I

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-29 Thread Michael Paquier
On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera wrote: > Peter Geoghegan wrote: > >> We certainly do still see wrong answers to queries here: >> >> postgres=# select ctid, xmin, xmax, * from t; >> ctid | xmin | xmax | id | name | x >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 3:20 PM, Robert Haas wrote: > On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan wrote: >> In the end, commit 6bfa88a fixed that old recovery bug by making sure >> the recovery routine heap_xlog_lock() did the right thing. In both >>

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Robert Haas
On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan wrote: > In the end, commit 6bfa88a fixed that old recovery bug by making sure > the recovery routine heap_xlog_lock() did the right thing. In both > cases (Feb 2014 and today), the index wasn't really corrupt -- it just > pointed to

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 2:47 PM, Peter Geoghegan wrote: > In the end, commit 6bfa88a fixed that old recovery bug by making sure > the recovery routine heap_xlog_lock() did the right thing. In both > cases (Feb 2014 and today), the index wasn't really corrupt -- it just > pointed to

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 2:39 PM, Alvaro Herrera wrote: >> FWIW, I am reminded a little bit of the MultiXact/recovery bug I >> reported way back in February of 2014 [1], which also had a HOT >> interaction that caused index scans to give wrong answers, despite >> more or

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Peter Geoghegan wrote: > We certainly do still see wrong answers to queries here: > > postgres=# select ctid, xmin, xmax, * from t; > ctid | xmin | xmax | id | name | x > ---+---+--++--+--- > (0,1) | 21171 |0 | 1 | 111 | 0 > (0,7) | 21177 |0 | 3 | 333 | 5 >

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera wrote: > Alvaro Herrera wrote: > >> Odd that it's not fixed. I guess there's still some more work to do >> here ... > > Maybe what this means is that we need to do both Dan's initially > proposed patch (or something related

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Alvaro Herrera wrote: > Odd that it's not fixed. I guess there's still some more work to do > here ... Maybe what this means is that we need to do both Dan's initially proposed patch (or something related to it) apart from the fixes already pushed. IOW we need to put back some of the "tupkeep"

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Peter Geoghegan wrote: > On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera > wrote: > > Fix freezing of a dead HOT-updated tuple > > If I run Dan Wood's test case again, the obvious symptom (spurious > duplicates) goes away. However, the enhanced amcheck, and thus CREATE