Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:53 PM, Andres Freund  wrote:
> Primarily because it's not an anti-corruption tool. I'd be surprised if
> there weren't ways to corrupt the page using these corruptions that
> aren't detected by it.

It's very hard to assess the risk of missing something that's actually
detectable with total confidence, but I think that the check is actually
very thorough.

> But even if it were, I don't think there's
> enough information to do so in the general case. You very well can end
> up with pages where subsequent hot pruning has removed a good bit of the
> direct evidence of this bug.

Sure, but maybe those are cases that can't get any worse anyway. So the
question of avoiding making it worse doesn't arise.

> But I'm not really sure why the error detection capabilities of matter
> much for the principal point I raised, which is how much work we need to
> do to not further worsen the corruption.

You're right. Just trying to put the risk in context, and to understand the
extent of the concern that you have.

-- 
Peter Geoghegan


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-09 16:45:07 -0800, Peter Geoghegan wrote:
> On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund  wrote:
> >> Actually, on second thought, I take that back -- I don't think that
> >> REINDEXing will even finish once a HOT chain is broken by the bug.
> >> IndexBuildHeapScan() actually does quite a good job of making sure
> >> that HOT chains are sane, which is how the enhanced amcheck notices
> >> the bug here in practice.
> >
> > I think that's too optimistic.
> 
> Why? Because the "find the TID of the root" logic in
> IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
> actual root (it might be some other HOT chain root following TID
> recycling by VACUUM)?

Primarily because it's not an anti-corruption tool. I'd be surprised if
there weren't ways to corrupt the page using these corruptions that
aren't detected by it.  But even if it were, I don't think there's
enough information to do so in the general case.  You very well can end
up with pages where subsequent hot pruning has removed a good bit of the
direct evidence of this bug.

But I'm not really sure why the error detection capabilities of matter
much for the principal point I raised, which is how much work we need to
do to not further worsen the corruption.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund  wrote:
>> Actually, on second thought, I take that back -- I don't think that
>> REINDEXing will even finish once a HOT chain is broken by the bug.
>> IndexBuildHeapScan() actually does quite a good job of making sure
>> that HOT chains are sane, which is how the enhanced amcheck notices
>> the bug here in practice.
>
> I think that's too optimistic.

Why? Because the "find the TID of the root" logic in
IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
actual root (it might be some other HOT chain root following TID
recycling by VACUUM)?

Assuming that's what you meant: I would have thought that the
xmin/xmax matching within heap_get_root_tuples() makes the sanity
checking fairly reliable in practice.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund  wrote:
>> I don't follow you here. Why would REINDEXing make the rows that
>> should be dead disappear again, even for a short period of time?
>
> It's not the REINDEX that makes them reappear.

Of course. I was just trying to make sense of what you said.

> It's the second
> vacuum. The reindex part was about $user trying to fix the problem...
> As you need two vacuums with appropriate cutoffs to hit the "rows
> revive" problem, that'll often in practice not happen immediately.

This explanation clears things up, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-09 16:02:17 -0800, Peter Geoghegan wrote:
> > What I'm currently wondering about is how much we need to harden
> > postgres against such existing corruption. If e.g. the hot chains are
> > broken somebody might have reindexed thinking the problem is fixed - but
> > if they then later vacuum everything goes to shit again, with dead rows
> > reappearing.
> 
> I don't follow you here. Why would REINDEXing make the rows that
> should be dead disappear again, even for a short period of time?

It's not the REINDEX that makes them reappear. It's the second
vacuum. The reindex part was about $user trying to fix the problem...
As you need two vacuums with appropriate cutoffs to hit the "rows
revive" problem, that'll often in practice not happen immediately.


> Actually, on second thought, I take that back -- I don't think that
> REINDEXing will even finish once a HOT chain is broken by the bug.
> IndexBuildHeapScan() actually does quite a good job of making sure
> that HOT chains are sane, which is how the enhanced amcheck notices
> the bug here in practice.

I think that's too optimistic.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Peter Geoghegan
On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund  wrote:
> Attached is a version of the already existing regression test that both
> reproduces the broken hot chain (and thus failing index lookups) and
> then also the tuple reviving.  I don't see any need for letting this run
> with arbitrary permutations.

I thought that the use of every possible permutation was excessive,
myself. It left us with an isolation test that didn't precisely
describe the behavior that is tested. What you came up with seems far,
far better, especially because of the comments you included. The mail
message-id references seem to add a lot, too.

> What I'm currently wondering about is how much we need to harden
> postgres against such existing corruption. If e.g. the hot chains are
> broken somebody might have reindexed thinking the problem is fixed - but
> if they then later vacuum everything goes to shit again, with dead rows
> reappearing.

I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time? It
might do so for index scans, I suppose, but not for sequential scans.
Are you concerned about a risk of somebody not noticing that
sequential scans are still broken?

Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice. (Before this bug was discovered, I would
have expected amcheck to catch problems like it slightly later, during
the Bloom filter probe for that HOT chain...but, in fact, it never
gets there with corruption from this bug in practice, AFAIK.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-09 Thread Andres Freund
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation.   The
> committed test does *not* actually trigger that.
> 
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving.  I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)


What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.  There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important.  A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions.  I'm inclined to do so, but it'll make the patch larger...

Comments?

Greetings,

Andres Freund
# Test for interactions of tuple freezing with dead, as well as recently-dead
# tuples using multixacts via FOR KEY SHARE.
setup
{
  DROP TABLE IF EXISTS tab_freeze;
  CREATE TABLE tab_freeze (
id int PRIMARY KEY,
name char(3),
x int);
  INSERT INTO tab_freeze VALUES (1, '111', 0);
  INSERT INTO tab_freeze VALUES (3, '333', 0);
}

teardown
{
  DROP TABLE tab_freeze;
}

session "s1"
step "s1_begin" { BEGIN; }
step "s1_update"{ UPDATE tab_freeze SET x = x + 1 WHERE id = 3; }
step "s1_commit"{ COMMIT; }
step "s1_vacuum"{ VACUUM FREEZE tab_freeze; }
step "s1_selectone" {
BEGIN;
SET LOCAL enable_seqscan = false;
SET LOCAL enable_bitmapscan = false;
SELECT * FROM tab_freeze WHERE id = 3;
COMMIT;
}
step "s1_selectall" { SELECT * FROM tab_freeze ORDER BY name, id; }

session "s2"
step "s2_begin" { BEGIN; }
step "s2_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; 
}
step "s2_commit"{ COMMIT; }
step "s2_vacuum"{ VACUUM FREEZE tab_freeze; }

session "s3"
step "s3_begin" { BEGIN; }
step "s3_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; 
}
step "s3_commit"{ COMMIT; }
step "s3_vacuum"{ VACUUM FREEZE tab_freeze; }

# This permutation verfies that a previous bug
# https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com
# https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
# is not reintroduced. We used to make wrong pruning / freezing
# decision for multixacts, which could lead to a) broken hot chains b)
# dead rows being revived.
permutation "s1_begin" "s2_begin" "s3_begin" # start transactions
   "s1_update" "s2_key_share" "s3_key_share" # have xmax be a multi with an 
updater, updater being oldest xid
   "s1_update" # create additional row version that has multis
   "s1_commit" "s2_commit" # commit both updater and share locker
   "s2_vacuum" # due to bug in freezing logic, we used to *not* prune updated 
row, and then froze it
   "s1_selectone" # if hot chain is broken, the row can't be found via index 
scan
   "s3_commit" # commit remaining open xact
   "s2_vacuum" # pruning / freezing in broken hot chains would unset xmax, 
reviving rows
   "s1_selectall" # show borkedness

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-04 Thread Andres Freund
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The current testcase, and I think the descriptions in the relevant
> threads, all actually fail to point out the precise way the bug is
> triggered.  As e.g. evidenced that the freeze-the-dead case appears to
> not cause any failures in !assertion builds even if the bug is present.

Trying to write up tests reproducing more of the issues in the area, I
think I might have found a third issue - although I'm not sure how
practically relevant it is:

When FreezeMultiXactId() decides it needs to create a new multi because
the old one is below the cutoff, that attempt can be defeated by the
multixact id caching. If the new multi has exactly the same members the
multixact id cache will just return the existing multi with the same
members. The cache will routinely be primed from the lookup of its
members.

I'm not yet sure how easily this can be hit in practice, because
commonly the multixact horizon should prevent a multi with all its
members living from being below the horizon. I found a situation where
that's not the case with the current bug, but I'm not sif that can
happen otherwise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-04 Thread Andres Freund
On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote:
> Andres Freund  wrote:
> > Here's that patch.  I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> 
> The commit message should probably say that the bug involves the
> resurrection of previously dead tuples, which is different to there
> being duplicates because a constraint is not enforced because HOT chains
> are broken (that's a separate, arguably less serious problem).

The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation.   The
committed test does *not* actually trigger that.

The reason I couldn't quite figure out how the problem triggers is that
the xmax removing branch in FreezeMultiXactId() can only be reached if
the multi is from before the cutoff - which it can't have been for a
single vacuum execution to trigger the bug, because that'd require the
multi to be running to falsely return RECENTLY_DEAD rather than DEAD (by
definition a multi can't be below the cutoff if running).

For the problem to occur I think vacuum has to be executed *twice*: The
first time through HTSV mistakenly returns RECENTLY_DEAD preventing the
tuple from being pruned. That triggers FreezeMultiXactId() to create a
*new* multi with dead members. At this point the database already is in
a bad state. Then in a second vacuum HTSV returns DEAD, but
 * Ordinarily, DEAD tuples would have 
been removed by
 * heap_page_prune(), but it's possible 
that the tuple
 * state changed since 
heap_page_prune() looked.  In
 * particular an INSERT_IN_PROGRESS 
tuple could have
 * changed to DEAD if the inserter 
aborted.  So this
 * cannot be considered an error 
condition.
 *
..
if (HeapTupleIsHotUpdated() ||
HeapTupleIsHeapOnly())
{
nkeep += 1;

prevents the tuple from being removed. If now the multi xmax is below
the xmin horizon it triggers
/*
 * If the xid is older than the cutoff, it has to have 
aborted,
 * otherwise the tuple would have gotten pruned away.
 */
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
elog(ERROR, "can't freeze committed 
xmax");
*flags |= FRM_INVALIDATE_XMAX;
in FreezeMultiXact. Without the new elog, this then causes xmax to be
removed, reviving the tuple.


The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered.  As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.


The good news is that the error checks I added in the patch upthread
prevent all of this from happening, even though I'd not yet understood
the mechanics fully - it's imnsho pretty clear that we need to be more
paranoid in production builds around this.   A bunch of users that
triggered largely "invisible" corruption (the first vacuum described
above) will possibly run into one of these elog()s, but that seems far
preferrable to making the corruption a lot worse.


I think unfortunately the check + elog() in the
HeapTupleIsHeapOnly())
{
nkeep += 1;

/*
 * If this were to happen for a 
tuple that actually
 * needed to be frozen, we'd be 
in trouble, because
 * it'd leave a tuple below the 
relation's xmin
 * horizon alive.
 */
if 
(heap_tuple_needs_freeze(tuple.t_data, FreezeLimit,

MultiXactCutoff, buf))
{
 

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Andres Freund


On November 4, 2017 1:22:04 AM GMT+05:30, Alvaro Herrera 
 wrote:
>Peter Geoghegan wrote:
>> Andres Freund  wrote:
>
>> > Staring at the vacuumlazy hunk I think I might have found a related
>bug:
>> > heap_update_tuple() just copies the old xmax to the new tuple's
>xmax if
>> > a multixact and still running.  It does so without verifying
>liveliness
>> > of members.  Isn't that buggy? Consider what happens if we have
>three
>> > blocks: 1 has free space, two is being vacuumed and is locked,
>three is
>> > full and has a tuple that's key share locked by a live tuple and is
>> > updated by a dead xmax from before the xmin horizon. In that case
>afaict
>> > the multi will be copied from the third page to the first one. 
>Which is
>> > quite bad, because vacuum already processed it, and we'll set
>> > relfrozenxid accordingly.  I hope I'm missing something here?
>> 
>> Can you be more specific about what you mean here? I think that I
>> understand where you're going with this, but I'm not sure.
>
>He means that the tuple that heap_update moves to page 1 (which will no
>longer be processed by vacuum) will contain a multixact that's older
>than relminmxid -- because it is copied unchanged by heap_update
>instead
>of properly checking against age limit.

Right. That, or a member xid below relminxid. I think both scenarios are 
possible.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Peter Geoghegan

Alvaro Herrera  wrote:

He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.


I see. The problem is more or less with this heap_update() code:

   /*
* And also prepare an Xmax value for the new copy of the tuple.  If there
* was no xmax previously, or there was one but all lockers are now gone,
* then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
* rare cases that might also be InvalidXid and yet not have the
* HEAP_XMAX_INVALID bit set; that's fine.)
*/
   if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
   HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
   (checked_lockers && !locker_remains))
   xmax_new_tuple = InvalidTransactionId;
   else
   xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

My naive guess is that we have to create a new MultiXactId here in at
least some cases, just like FreezeMultiXactId() sometimes does.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Alvaro Herrera
Peter Geoghegan wrote:
> Andres Freund  wrote:

> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running.  It does so without verifying liveliness
> > of members.  Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one.  Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly.  I hope I'm missing something here?
> 
> Can you be more specific about what you mean here? I think that I
> understand where you're going with this, but I'm not sure.

He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Peter Geoghegan

Andres Freund  wrote:

Here's that patch.  I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.


The commit message should probably say that the bug involves the
resurrection of previously dead tuples, which is different to there
being duplicates because a constraint is not enforced because HOT chains
are broken (that's a separate, arguably less serious problem).


Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running.  It does so without verifying liveliness
of members.  Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one.  Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly.  I hope I'm missing something here?


Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-03 Thread Andres Freund
On 2017-11-02 06:05:51 -0700, Andres Freund wrote:
> Hi,
> 
> On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > I think the problem is on the pruning, rather than the freezing side. We
> > > can't freeze a tuple if it has an alive predecessor - rather than
> > > weakining this, we should be fixing the pruning to not have the alive
> > > predecessor.
> > 
> > I gave a look at HTSV back then, but I didn't find what the right tweak
> > was, but then I only tried changing the return value to DEAD and
> > DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> > on OldestXmin didn't occur to me ... I was thinking that the fact that
> > there were live lockers meant that the tuple could not be removed,
> > obviously failing to notice that the subsequent versions of the tuple
> > would be good enough.
> 
> I'll try to write up a commit based on that idea. I think there's some
> comment work needed too, Robert and I were both confused by a few
> things.
> I'm unfortunately travelling atm - it's evening here, and I'll flying
> back to the US all Saturday. I'm fairly sure I'll be able to come up
> with a decent patch tomorrow, but I'll need review and testing by
> others.

Here's that patch.  I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.

I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.

Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running.  It does so without verifying liveliness
of members.  Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one.  Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly.  I hope I'm missing something here?

Greetings,

Andres Freund
>From 774376dc8f7ed76657660e6beb0f5191d1bdaae4 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 3 Nov 2017 07:52:29 -0700
Subject: [PATCH] Fix pruning of locked and updated tuples.

Previously it was possible that a tuple was not pruned during vacuum,
even though it's update xmax (i.e. the updating xid in a multixact
with both key share lockers and an updater) was below the cutoff
horizon.

As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, leading to index lookups failing, which in turn can
lead to constraints being violated.

Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.

As the wrong freezing of xmax can cause data corruption, extend sanity
checks and promote them to elogs rather than assertions.

Per-Discussion: Robert Haas and Freund
Reported-By: Daniel Wood
Discussion:
https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com
https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
---
 src/backend/access/heap/heapam.c  | 34 --
 src/backend/commands/vacuumlazy.c | 13 +
 src/backend/utils/time/tqual.c| 53 +++
 src/test/isolation/isolation_schedule |  1 +
 4 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 765750b8743..9b963e0cd0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6412,7 +6412,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 */
 			if (TransactionIdPrecedes(xid, cutoff_xid))
 			{
-Assert(!TransactionIdDidCommit(xid));
+if (TransactionIdDidCommit(xid))
+	elog(ERROR, "can't freeze committed xmax");
 *flags |= FRM_INVALIDATE_XMAX;
 xid = InvalidTransactionId; /* not strictly necessary */
 			}
@@ -6484,6 +6485,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		{
 			TransactionId xid = members[i].xid;
 
+			Assert(TransactionIdIsValid(xid));
+
 			/*
 			 * It's an update; should we keep it?  If the transaction is known
 			 * aborted or crashed then it's okay to ignore it, otherwise not.
@@ -6512,18 +6515,24 @@ 

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan  wrote:
> On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas  wrote:
>> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
>> I think things get a lot more dangerous.  The problem (as Andres
>> pointed out to me this afternoon) is that it seems possible to end up
>> with a situation where there should be two HOT chains on a page, and
>> because of the weakened xmin/xmax checking rules, we end up thinking
>> that the second one is a continuation of the first one, which will be
>> all kinds of bad news.  That would be particularly likely to happen in
>> releases from before we invented HEAP_XMIN_FROZEN, when there's no
>> xmin/xmax matching at all, but could happen on later releases if we
>> are extraordinarily unlucky (i.e. xmin of the first tuple in the
>> second chain happens to be the same as the pre-freeze xmax in the old
>> chain, probably because the same XID was used to update the page in
>> two consecutive epochs).  Fortunately, that commit is (I think) not
>> released anywhere.
>
> FWIW, if you look at the second commit
> (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
> that it doesn't even treat those two cases differently. It was buggy
> even on its own terms. The FrozenTransactionId test used an xmin from
> HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

Oh, wow.  You seem to be correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Peter Geoghegan
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas  wrote:
> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
> I think things get a lot more dangerous.  The problem (as Andres
> pointed out to me this afternoon) is that it seems possible to end up
> with a situation where there should be two HOT chains on a page, and
> because of the weakened xmin/xmax checking rules, we end up thinking
> that the second one is a continuation of the first one, which will be
> all kinds of bad news.  That would be particularly likely to happen in
> releases from before we invented HEAP_XMIN_FROZEN, when there's no
> xmin/xmax matching at all, but could happen on later releases if we
> are extraordinarily unlucky (i.e. xmin of the first tuple in the
> second chain happens to be the same as the pre-freeze xmax in the old
> chain, probably because the same XID was used to update the page in
> two consecutive epochs).  Fortunately, that commit is (I think) not
> released anywhere.

FWIW, if you look at the second commit
(22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
that it doesn't even treat those two cases differently. It was buggy
even on its own terms. The FrozenTransactionId test used an xmin from
HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Peter Geoghegan
On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund  wrote:
> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

Excellent catch.

> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple.  I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.

I didn't even know that that was safe.

> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

Frankly, I'm relieved that you got to this. I was highly suspicious of
a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific,
actionable concern about how it failed to handle the
9.3/FrozenTransactionId xmin case as special. As I went into in the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, these commits left us with a situation where there didn't
seem to be a reliable way of knowing whether or not it is safe to
interrogate clog for a given heap tuple using a tool like amcheck.
And, it wasn't obvious that you couldn't have a codepath that failed
to account for pre-cutoff non-frozen tuples -- codepaths that call
TransactionIdDidCommit() despite it actually being unsafe.

If I'm not mistaken, your proposed fix restores sanity there.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera  wrote:
> Pushed the reverts.
>
> I noticed while doing so that REL_10_STABLE contains the bogus commits.
> Does that change our opinion regarding what to do for people upgrading
> to a version containing the broken commits?  I don't think so, because
>
>   1) we hope that not many people will trust their data to 10.0
>  immediately after release
>   2) the bug is very low probability
>   3) it doesn't look like we can do a lot about it anyway.

Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.

Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid.  Those tuples will be
hinted-committed.  That's not good, but it might not really have much
in the way of consequences.  *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK.  And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG.  If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.

The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous.  The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news.  That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs).  Fortunately, that commit is (I think) not
released anywhere.

Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are
scary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Alvaro Herrera
Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Andres Freund  writes:
> > > Do we care about people upgrading to unreleased versions? We could do
> > > nothing, document it in the release notes, or ???
> > 
> > Do nothing.
> 
> Agreed.  Not much we can do there.

Pushed the reverts.

I noticed while doing so that REL_10_STABLE contains the bogus commits.  
Does that change our opinion regarding what to do for people upgrading
to a version containing the broken commits?  I don't think so, because

  1) we hope that not many people will trust their data to 10.0
 immediately after release
  2) the bug is very low probability
  3) it doesn't look like we can do a lot about it anyway.

I'll experiment with Andres' proposed fix now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund  writes:
> > Do we care about people upgrading to unreleased versions? We could do
> > nothing, document it in the release notes, or ???
> 
> Do nothing.

Agreed.  Not much we can do there.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Tom Lane
Andres Freund  writes:
> Do we care about people upgrading to unreleased versions? We could do
> nothing, document it in the release notes, or ???

Do nothing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Andres Freund
Hi,

On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> Andres Freund wrote:
> > I think the problem is on the pruning, rather than the freezing side. We
> > can't freeze a tuple if it has an alive predecessor - rather than
> > weakining this, we should be fixing the pruning to not have the alive
> > predecessor.
> 
> I gave a look at HTSV back then, but I didn't find what the right tweak
> was, but then I only tried changing the return value to DEAD and
> DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> on OldestXmin didn't occur to me ... I was thinking that the fact that
> there were live lockers meant that the tuple could not be removed,
> obviously failing to notice that the subsequent versions of the tuple
> would be good enough.

I'll try to write up a commit based on that idea. I think there's some
comment work needed too, Robert and I were both confused by a few
things.
I'm unfortunately travelling atm - it's evening here, and I'll flying
back to the US all Saturday. I'm fairly sure I'll be able to come up
with a decent patch tomorrow, but I'll need review and testing by
others.


> > If the update xmin is actually below the cutoff we can remove the tuple
> > even if there's live lockers - the lockers will also be present in the
> > newer version of the tuple.  I verified that for me that fixes the
> > problem. Obviously that'd require some comment work and more careful
> > diagnosis.
> 
> Sounds good.
> 
> I'll revert those commits then, keeping the test, and then you can
> commit your change.  OK?

Generally that sounds good - but you can't keep the testcase in without
the new fix - the buildfarm would immediately turn red. I guess the best
thing would be to temporarily remove it from the schedule?


Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Alvaro Herrera
Andres Freund wrote:

> I spent some time discussing this with Robert today (with both of us
> alternating between feeling the other and ourselves as stupid), and the
> conclusion I think is that the problem is on the pruning, rather than
> the freezing side.

Thanks both for spending some more time on this.

> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.

> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple.  I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.

Sounds good.

I'll revert those commits then, keeping the test, and then you can
commit your change.  OK?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Robert Haas
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund  wrote:
> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

I think this is a key point.  If the new behavior were merely not
entirely correct, we could perhaps refine it later.  But it's not only
not correct - it actually has the potential to create new problems
that didn't exist before those commits.  And if we release without
reverting those commits then we can't change our mind later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-11-02 Thread Andres Freund
Hi,

On 2017-09-28 14:47:53 +, Alvaro Herrera wrote:
> Fix freezing of a dead HOT-updated tuple
>
> Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
> liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples.  But
> concurrent transaction commit/abort may turn DEAD some of the HOT tuples
> that survived the prune, before HeapTupleSatisfiesVacuum tests them.
> This happens to activate the code that decides to freeze the tuple ...
> which resuscitates it, duplicating data.
>
> (This is especially bad if there's any unique constraints, because those
> are now internally violated due to the duplicate entries, though you
> won't know until you try to REINDEX or dump/restore the table.)
>
> One possible fix would be to simply skip doing anything to the tuple,
> and hope that the next HOT prune would remove it.  But there is a
> problem: if the tuple is older than freeze horizon, this would leave an
> unfrozen XID behind, and if no HOT prune happens to clean it up before
> the containing pg_clog segment is truncated away, it'd later cause an
> error when the XID is looked up.
>
> Fix the problem by having the tuple freezing routines cope with the
> situation: don't freeze the tuple (and keep it dead).  In the cases that
> the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
> so that there is no need to look up the XID in pg_clog later on.

I think this is the wrong fix - the assumption that ctid chains can be
validated based on the prev-xmax = cur-xmin is fairly ingrained into the
system, and we shouldn't just be breaking it. The need to later
lobotomize the checks, in a5736bf754, is some evidence of that.

I spent some time discussing this with Robert today (with both of us
alternating between feeling the other and ourselves as stupid), and the
conclusion I think is that the problem is on the pruning, rather than
the freezing side.

FWIW, I don't think the explanation in the commit message of how the
problem triggers is actually correct - the testcase you added doesn't
have any transactions concurrently committing / aborting when
crashing. Rather the problem is that the liveliness checks for freezing
is different from the ones for pruning. HTSV considers xmin
RECENTLY_DEAD when there's a multi xmax with at least one alive locker,
whereas pruning thinks it has to do something because there's the member
xid below the cutoff. No concurrent activity is needed to trigger that.

I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.

The relevant logic in HTSV is:

if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
TransactionId xmax;

if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), 
false))
{
/* already checked above */
Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));

xmax = HeapTupleGetUpdateXid(tuple);

/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));

if (TransactionIdIsInProgress(xmax))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(xmax))
/* there are still lockers around -- can't 
return DEAD here */
return HEAPTUPLE_RECENTLY_DEAD;
/* updating transaction aborted */
return HEAPTUPLE_LIVE;

with the problematic branch being the TransactionIdDidCommit()
case. Rather than unconditionally returning HEAPTUPLE_RECENTLY_DEAD, we
should test the update xid against OldestXmin and return DEAD /
RECENTLY_DEAD according to that.

If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple.  I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.


I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.


In short, I think the two commits should be reverted, and replaced with
a fix along what I'm outlining above.

There'll be some trouble for people that upgraded to an unreleased
version, but I don't really see what we could do about that.

I could be entirely wrong - I've been travelling for the last two weeks
and my brain is somewhat more fried than usual.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier
On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan  wrote:
> I found one glitch with our merge of the original dup row fix.  With that 
> corrected AND Alvaro’s Friday fix things are solid.
> No dup’s.  No index corruption.
>
> Thanks so much.

Nice to hear that! You guys seem to be doing extensive testing and
actually report back about it, which is really nice to see.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Wood, Dan
I found one glitch with our merge of the original dup row fix.  With that 
corrected AND Alvaro’s Friday fix things are solid.
No dup’s.  No index corruption.

Thanks so much. 

On 10/10/17, 7:25 PM, "Michael Paquier"  wrote:

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
 wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Michael Paquier
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
 wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Alvaro Herrera
Wood, Dan wrote:
> I’m unclear on what is being repro’d in 9.6.  Are you getting the
> duplicate rows problem or just the reindex problem?  Are you testing
> with asserts enabled(I’m not)?

I was seeing just the reindex problem.  I don't see any more dups.

But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened.  Maybe I made a mistake last week and
ran an unfixed version.  I don't see any more problems now.

> If you are getting the dup rows consider the code in the block in
> heapam.c that starts with the comment “replace multi by update xid”.
>
> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
> There is an updater in the multixact array however the status is
> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
> assume this is a preliminary status before the following row in the
> hot chain has it’s multixact set to NoKeyUpdate.

Yes, the "For" version is the locker version rather than the actual
update.  That lock is acquired by EvalPlanQual locking the row just
before doing the update.  I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.

> Since a 0 is returned this does precede cutoff_xid and
> TransactionIdDidCommit(0) will return false.  This ends up aborting
> the multixact on the row even though the real xid is committed.  This
> sets XMAX to 0 and that row becomes visible as one of the dups.
> Interestingly the real xid of the updater is 122944 and the cutoff_xid
> is 122945.

I haven't seen this effect.  Please keep us updated if you're able to
verify corruption this way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-09 Thread Michael Paquier
On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan  wrote:
> On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera  
> wrote:
>> Hmm, I think I added a random sleep (max. 100ms) right after the
>> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
>> makes the race easier to hit.
>
> I still cannot reproduce. Perhaps you can be more specific?

I have been trying to reproduce things for a total of 4 hours, testing
various variations of the proposed test cases (killed Postgres,
changed fillfactor, manual sleep calls), but I am proving unable to
see a regression as well. I would not think that the OS matters here,
all my attempts were on macos with assertions and debugging enabled.

At least the code is now more stable, which is definitely a good thing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-09 Thread Wood, Dan
I’m unclear on what is being repro’d in 9.6.  Are you getting the duplicate 
rows problem or just the reindex problem?  Are you testing with asserts 
enabled(I’m not)?

If you are getting the dup rows consider the code in the block in heapam.c that 
starts with the comment “replace multi by update xid”.
When I repro this I find that MultiXactIdGetUpdateXid() returns 0.  There is an 
updater in the multixact array however the status is 
MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I assume 
this is a preliminary status before the following row in the hot chain has it’s 
multixact set to NoKeyUpdate.

Since a 0 is returned this does precede cutoff_xid and 
TransactionIdDidCommit(0) will return false.  This ends up aborting the 
multixact on the row even though the real xid is committed.  This sets XMAX to 
0 and that row becomes visible as one of the dups.  Interestingly the real xid 
of the updater is 122944 and the cutoff_xid is 122945.

I’m still debugging but I start late so I’m passing this incomplete info along 
now.

On 10/7/17, 4:25 PM, "Alvaro Herrera"  wrote:

Peter Geoghegan wrote:
> On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera  
wrote:
> >> As you must have seen, Alvaro said he has a variant of Dan's original
> >> script that demonstrates that a problem remains, at least on 9.6+,
> >> even with today's fix. I think it's the stress-test that plays with
> >> fillfactor, many clients, etc [1].
> >
> > I just execute setup.sql once and then run this shell command,
> >
> > while :; do
> > psql -e -P pager=off -f ./repro.sql
> > for i in `seq 1 5`; do
> > psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> > done
> > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> > psql -P pager=off -e --no-psqlrc -f ./report.sql
> > echo "done"
> > done
> 
> I cannot reproduce the problem on my personal machine using this
> script/stress-test. I tried to do so on the master branch git tip.
> This reinforces the theory that there is some timing sensitivity,
> because the remaining race condition is very narrow.

Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-08 Thread Peter Geoghegan
On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera  wrote:
> Hmm, I think I added a random sleep (max. 100ms) right after the
> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
> makes the race easier to hit.

I still cannot reproduce. Perhaps you can be more specific?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera  
> wrote:
> >> As you must have seen, Alvaro said he has a variant of Dan's original
> >> script that demonstrates that a problem remains, at least on 9.6+,
> >> even with today's fix. I think it's the stress-test that plays with
> >> fillfactor, many clients, etc [1].
> >
> > I just execute setup.sql once and then run this shell command,
> >
> > while :; do
> > psql -e -P pager=off -f ./repro.sql
> > for i in `seq 1 5`; do
> > psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> > done
> > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> > psql -P pager=off -e --no-psqlrc -f ./report.sql
> > echo "done"
> > done
> 
> I cannot reproduce the problem on my personal machine using this
> script/stress-test. I tried to do so on the master branch git tip.
> This reinforces the theory that there is some timing sensitivity,
> because the remaining race condition is very narrow.

Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Peter Geoghegan
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera  wrote:
>> As you must have seen, Alvaro said he has a variant of Dan's original
>> script that demonstrates that a problem remains, at least on 9.6+,
>> even with today's fix. I think it's the stress-test that plays with
>> fillfactor, many clients, etc [1].
>
> I just execute setup.sql once and then run this shell command,
>
> while :; do
> psql -e -P pager=off -f ./repro.sql
> for i in `seq 1 5`; do
> psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> done
> wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> psql -P pager=off -e --no-psqlrc -f ./report.sql
> echo "done"
> done

I cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-07 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen  wrote:
> > Yesterday, I've been spending time with pg_visibility on the pages when I 
> > reproduce the issue in 9.6.
> > None of the all-frozen or all-visible bits are necessarily set in 
> > problematic pages.
> 
> Since this happened yesterday, I assume it was with an unfixed version?
> 
> As you must have seen, Alvaro said he has a variant of Dan's original
> script that demonstrates that a problem remains, at least on 9.6+,
> even with today's fix. I think it's the stress-test that plays with
> fillfactor, many clients, etc [1].

I just execute setup.sql once and then run this shell command,

while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql 
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
done

Note that you need to use pg10's psql because of the \if lines in
lock.sql.  For some runs I change the values to compare random() to, and
originally the commented out section in lock.sql was not commented out,
but I'm fairly sure the failures I saw where with this version.  Also, I
sometime change the 5 in the `seq` command to higher values (180, 250).

I didn't find the filler column to have any effect, so I took that out.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
begin;

values ('for key share') /*, ('') */ order by random() limit 1 \gset

select pg_sleep(random() * 0.03);
select id from t where id=3 :column1 ;

select random() > .0 as update \gset
\if :update
select pg_sleep(random() * 0.03);
update t set x=x+1 where id=3;
\endif

/*
select random() > .0 as update2 \gset
\if :update2
savepoint foo;
update t set x=x+1 where id=3;
\endif
select random() > .5 as rollback_to \gset
\if :rollback_to
rollback to foo;
\endif
*/

select random() > .0 as commit \gset

\if :commit
   commit;
\else
   rollback;
\endif

select pg_sleep(random() * 0.03);
vacuum freeze t;

reindex index t_pkey;


with
pages (blkno) as (select generate_series::int from generate_series(0, 
pg_relation_size('t')/current_setting('block_size')::int - 1)),
rawpages (blkno, pg) as (select blkno, get_raw_page from pages, 
get_raw_page('t', blkno)),
heapitems as (select blkno, heap_page_items.* from rawpages, 
heap_page_items(pg))
select 
blkno, lp, lp_flags, lp_off, t_xmin, t_xmax, t_ctid,
infomask(t_infomask, 1) as infomask, infomask(t_infomask2, 2) as infomask2
from heapitems where lp_off <> 0
drop table t;
create table t (id int primary key, name char(3), x integer)
-- with (fillfactor = 10)
;

insert into t values (1, '111', 0);
insert into t values (3, '333', 0);
create type infomask_bit_desc as (mask varbit, symbol text);
create or replace function infomask(msk int, which int) returns text
language plpgsql as $$
declare
r infomask_bit_desc;
str text = '';
append_bar bool = false;
begin
for r in select * from infomask_bits(which) loop
if (msk::bit(16) & r.mask)::int <> 0 then
if append_bar then
str = str || '|';
end if;
append_bar = true;
str = str || r.symbol;
end if;
end loop;
return str;
end;
$$ ;
create or replace function infomask_bits(which int)
returns setof infomask_bit_desc
language plpgsql as $$
begin
if which = 1 then
return query values
(x'8000'::varbit, 'MOVED_IN'),
(x'4000', 'MOVED_OFF'),
(x'2000', 'UPDATED'),
(x'1000', 'XMAX_IS_MULTI'),
(x'0800', 'XMAX_INVALID'),
(x'0400', 'XMAX_COMMITTED'),
(x'0200', 'XMIN_INVALID'),
(x'0100', 'XMIN_COMMITTED'),
(x'0080', 'XMAX_LOCK_ONLY'),
(x'0040', 'EXCL_LOCK'),
(x'0020', 'COMBOCID'),
(x'0010', 'XMAX_KEYSHR_LOCK'),
(x'0008', 'HASOID'),
(x'0004', 'HASEXTERNAL'),
(x'0002', 'HASVARWIDTH'),
(x'0001', 'HASNULL');
elsif which = 2 then
return query values
(x'2000'::varbit, 'UPDATE_KEY_REVOKED'),
(x'4000', 'HOT_UPDATED'),
(x'8000', 'HEAP_ONLY_TUPLE');
end if;
end;
$$;

create extension if not exists pageinspect;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen  wrote:
> Yesterday, I've been spending time with pg_visibility on the pages when I 
> reproduce the issue in 9.6.
> None of the all-frozen or all-visible bits are necessarily set in problematic 
> pages.

Since this happened yesterday, I assume it was with an unfixed version?

As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].

I've tried to independently reproduce the problem on the master
branch's current tip, with today's new fix, but cannot break things
despite trying many variations. I cannot reproduce the problem that
Alvaro still sees.

I'll have to wait until Alvaro posts his repro to the list before
commenting further, which I assume he'll post as soon as he can. There
doesn't seem to be much point in not waiting for that.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan  wrote:
>> I don't know if it's really the freeze map at fault or something else.
>
> Ideally, it would be possible to effectively disable the new freeze
> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.

The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera  wrote:
> I can tell that, in 9.6, REINDEX still reports the error we saw in
> earlier releases, after some of the runs of my reproducer scripts.  I'm
> unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
> originally reported anywhere, either.

You mean the enhanced stress-test that varied fillfactor, added filler
columns, and so on [1]? Can you post that to the list, please? I think
that several of us would like to have a reproducible test case.

> I don't know if it's really the freeze map at fault or something else.

Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.

If I had to guess, I'd say that it's just as likely that the issue is
only reproducible on 9.6 because of the enhancements added in that
release that improved buffer pinning (the use of atomic ops to pin
buffers, moving buffer content locks into buffer descriptors, etc). It
was already a bit tricky to get the problem that remained after
20b6552 but before today's a5736bf to reproduce with Dan's script. It
often took me 4 or 5 attempts. (I wonder what it looks like with your
enhanced version of that script -- the one that I just asked about.)

It seems possible that we've merely reduced the window for the race to
the point that it's practically (though not theoretically) impossible
to reproduce the problem on versions < 9.6, though not on 9.6+.
Applying Occam's razor, the problem doesn't seem particularly likely
to be in the freeze map stuff, which isn't actually all that closely
related.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera  wrote:
> By the way, I still wonder if there's any way for a new tuple to get
> inserted in the place where a HOT redirect would be pointing to, and
> have it be marked as Frozen, where the old redirect contains a
> non-invalid Xmax.  I tried to think of a way for that to happen, but
> couldn't think of anything.
>
> What I imagine is a sequence like this:
>
> 1. insert a tuple
> 2. HOT-update a tuple
> 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
> 4. start transaction
> 5. HOT-update the tuple again, creating HOT in lp 3
> 6. abort transaction (leaving aborted update in lp 3)
> 7. somehow remove tuple from lp 3, make slot available
> 8. new transaction comes along, inserts tuple in lp 3
> 9. somebody freezes tuple in lp3 (???)
>
> Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
> the tuple is part of the chain because of an xid "match".
>
> Basically from step 7 onwards I don't think this is possible, but maybe
> I'm just blind.

For the record, I also think that this is impossible, in part because
pruning requires a cleanup buffer lock (and because HOT chains cannot
span pages). I wouldn't say that I am 100% confident about this,
though.

BTW, is this comment block that appears above
heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and
maybe much earlier commits)?

 * NB: It is not enough to set hint bits to indicate something is
 * committed/invalid -- they might not be set on a standby, or after crash
 * recovery.  We really need to remove old xids.
 */

We WAL-log setting hint bits during freezing now, iff tuple xmin is
before the Xid cutoff and tuple is a heap-only tuple.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera  
> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> What problem persists? The original one (or, at least, the original
> symptom of pruning HOT chains incorrectly)? If that's what you mean, I
> wouldn't be so quick to assume that it's the freeze map.

I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts.  I'm
unable to reproduce it anymore in 9.3 to 9.5.  I can't see the one Dan
originally reported anywhere, either.

I don't know if it's really the freeze map at fault or something else.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Peter Geoghegan
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera  wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax.  I tried to think of a way for that to happen, but
couldn't think of anything.

What I imagine is a sequence like this:

1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)

Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".

Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.


Maybe we can forestall the problem by checking whether the Xmax
TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some
variant thereof).  This would be very slow but safer; and in 9.4 and up
we'd only need to do it if the xmin value is actually FrozenXid which
should be rare (only in pages upgraded from 9.3).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Stephen Frost
Robert,

* Alvaro Herrera (alvhe...@alvh.no-ip.org) wrote:
> Michael Paquier wrote:
> > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  
> > wrote:
> > > Wood, Dan wrote:
> > >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> > >>
> > >> I would prefer to focus on either latest 9X or 11dev.
> > >
> > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > > (with the patch, I waited 10x as many iterations as it took for the
> > > problem to occur ~10 times without the patch), but I can reproduce a
> > > problem in 9.6 with my patch installed.  There must be something new in
> > > 9.6 that is causing the problem to reappear.
> > 
> > The freeze visibility map has been introduced in 9.6... There could be
> > interactions on this side.
> 
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.  I'll go commit what I have
> now.

I don't doubt you're watching this thread too, but just to be 110% sure
that we don't end up with the November releases still having this issue,
I'm adding you to the CC on this thread as the one who did the freeze
visibility map work.  Depending on hope here is a bit too squishy for me
when we're talking about corruption issues.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera  wrote:
> Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
> of researching this problem there, then.

I have some stuff using 9.6 extensively, so like Dan I think I'll
chime in anyway. Not before Tuesday though, long weekend in Japan
ahead.

> I'll go commit what I have now.

As far as I saw this set definitely improves the situation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  
> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed.  There must be something new in
> > 9.6 that is causing the problem to reappear.
> 
> The freeze visibility map has been introduced in 9.6... There could be
> interactions on this side.

Ah, thanks for the tip.  I hope the authors of that can do the gruntwork
of researching this problem there, then.  I'll go commit what I have
now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera  wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed.  There must be something new in
> 9.6 that is causing the problem to reappear.

The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  
>> wrote:
>> +/*
>> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
>> + * taking into account that the Xmin might have been frozen.
>> + */
>> [...]
>> +   /*
>> +* We actually don't know if there's a match, but if the previous tuple
>> +* was frozen, we cannot really rely on a perfect match.
>> +*/
>
> I don't know what you had in mind here,

Impossible to know if I don't actually send the contents :)

> but I tweaked the 9.3 version so that it now looks like this:

I wanted to mention that the comments could be reworked. And forgot to
suggest some.

> /*
>  * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
>  *
>  * Given the new version of a tuple after some update, verify whether the
>  * given Xmax (corresponding to the previous version) matches the tuple's
>  * Xmin, taking into account that the Xmin might have been frozen after the
>  * update.
>  */
> bool
> HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
> {
> TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
>
> /*
>  * If the xmax of the old tuple is identical to the xmin of the new 
> one,
>  * it's a match.
>  */
> if (TransactionIdEquals(xmax, xmin))
> return true;
>
> /*
>  * When a tuple is frozen, the original Xmin is lost, but we know 
> it's a
>  * committed transaction.  So unless the Xmax is InvalidXid, we don't
>  * know for certain that there is a match, but there may be one; and 
> we
>  * must return true so that a HOT chain that is half-frozen can be 
> walked
>  * correctly.
>  */
> if (TransactionIdEquals(xmin, FrozenTransactionId) &&
> TransactionIdIsValid(xmax))
> return true;
>
> return false;
> }

Those are clearly improvements.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Wood, Dan wrote:
> Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.
> 
> I would prefer to focus on either latest 9X or 11dev.  

I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed.  There must be something new in
9.6 that is causing the problem to reappear.

> Does Alvaro’s patch presume any of the other patch to set COMMITTED in
> the freeze code?

I don't know what you mean.  Here is the 9.6 version of my patch.  Note
that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall)
was introduced in 9.4.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 46ca12d56402d33a78ea0e13367d1e0e25a474dd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 6 Oct 2017 14:11:34 +0200
Subject: [PATCH] Fix bugs

---
 src/backend/access/heap/heapam.c| 52 +
 src/backend/access/heap/pruneheap.c |  4 +--
 src/backend/executor/execMain.c |  6 ++---
 src/include/access/heapam.h |  3 +++
 4 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b41f2a2fdd..10916b140e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2060,8 +2060,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation 
relation, Buffer buffer,
 * broken.
 */
if (TransactionIdIsValid(prev_xmax) &&
-   !TransactionIdEquals(prev_xmax,
-
HeapTupleHeaderGetXmin(heapTuple->t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(prev_xmax, 
heapTuple->t_data))
break;
 
/*
@@ -2244,7 +2243,7 @@ heap_get_latest_tid(Relation relation,
 * tuple.  Check for XMIN match.
 */
if (TransactionIdIsValid(priorXmax) &&
-   !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(tp.t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, tp.t_data))
{
UnlockReleaseBuffer(buffer);
break;
@@ -2276,6 +2275,50 @@ heap_get_latest_tid(Relation relation,
}   /* end of loop 
*/
 }
 
+/*
+ * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
+ *
+ * Given the new version of a tuple after some update, verify whether the
+ * given Xmax (corresponding to the previous version) matches the tuple's
+ * Xmin, taking into account that the Xmin might have been frozen after the
+ * update.
+ */
+bool
+HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
+{
+   TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
+
+   /*
+* If the xmax of the old tuple is identical to the xmin of the new one,
+* it's a match.
+*/
+   if (TransactionIdEquals(xmax, xmin))
+   return true;
+
+   /*
+* If the Xmin that was in effect prior to a freeze matches the Xmax,
+* it's good too.
+*/
+   if (HeapTupleHeaderXminFrozen(htup) &&
+   TransactionIdEquals(HeapTupleHeaderGetRawXmin(htup), xmax))
+   return true;
+
+   /*
+* When a tuple is frozen, the original Xmin is lost, but we know it's a
+* committed transaction.  So unless the Xmax is InvalidXid, we don't 
know
+* for certain that there is a match, but there may be one; and we must
+* return true so that a HOT chain that is half-frozen can be walked
+* correctly.
+*
+* We no longer freeze tuples this way, but we must keep this in order 
to
+* interpret pre-pg_upgrade pages correctly.
+*/
+   if (TransactionIdEquals(xmin, FrozenTransactionId) &&
+   TransactionIdIsValid(xmax))
+   return true;
+
+   return false;
+}
 
 /*
  * UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends
@@ -5693,8 +5736,7 @@ l4:
 * end of the chain, we're done, so return success.
 */
if (TransactionIdIsValid(priorXmax) &&
-   
!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
-priorXmax))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, 
mytup.t_data))
{
result = HeapTupleMayBeUpdated;
goto out_locked;
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 52231ac417..7753ee7b12 100644

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Alvaro Herrera
Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  
> wrote:

> +   /*
> +* If the xmax of the old tuple is identical to the xmin of the new one,
> +* it's a match.
> +*/
> +   if (xmax == xmin)
> +   return true;
> I would use TransactionIdEquals() here, to remember once you switch
> that to a macro.

I've had second thoughts about the macro thing -- for now I'm keeping it
a function, actually.

> +/*
> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
> + * taking into account that the Xmin might have been frozen.
> + */
> [...]
> +   /*
> +* We actually don't know if there's a match, but if the previous tuple
> +* was frozen, we cannot really rely on a perfect match.
> +*/

I don't know what you had in mind here, but I tweaked the 9.3 version so
that it now looks like this:

/*
 * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
 *
 * Given the new version of a tuple after some update, verify whether the
 * given Xmax (corresponding to the previous version) matches the tuple's
 * Xmin, taking into account that the Xmin might have been frozen after the
 * update.
 */
bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{
TransactionId   xmin = HeapTupleHeaderGetXmin(htup);

/*
 * If the xmax of the old tuple is identical to the xmin of the new one,
 * it's a match.
 */
if (TransactionIdEquals(xmax, xmin))
return true;

/*
 * When a tuple is frozen, the original Xmin is lost, but we know it's a
 * committed transaction.  So unless the Xmax is InvalidXid, we don't
 * know for certain that there is a match, but there may be one; and we
 * must return true so that a HOT chain that is half-frozen can be 
walked
 * correctly.
 */
if (TransactionIdEquals(xmin, FrozenTransactionId) &&
TransactionIdIsValid(xmax))
return true;

return false;
}


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-06 Thread Michael Paquier
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera  wrote:
> I think this is the patch for 9.3.  I ran the test a few hundred times
> (with some additional changes such as randomly having an update inside a
> savepoint that's randomly aborted, randomly aborting the transaction,
> randomly skipping the for key share lock, randomly sleeping at a few
> points; and also adding filler columns, reducing fillfactor and using
> 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
> to stay in the same page or migrate to later pages).  The final REINDEX
> has never complained again about failing to find the root tuple.  I hope
> it's good now.

I have looked and played with your patch as well for a couple of
hours, and did not notice any failures again. The structure of the
pages looked sane as well, I could see also with pageinspect a correct
HOT chain using the first set of tests provided on this thread.

> The attached patch needs a few small tweaks, such as improving
> commentary in the new function, maybe turn it into a macro (otherwise I
> think it could be bad for performance; I'd like a static func but not
> sure those are readily available in 9.3), change the XID comparison to
> use the appropriate macro rather than ==, and such.

Yeah that would be better.

> Regarding changes of xmin/xmax comparison, I also checked manually the
> spots I thought should be modified and later double-checked against the
> list that Michael posted.

Thanks. I can see see that your patch is filling the holes.

> It's a match, except for rewriteheap.c which
> I cannot make heads or tails about.  (I think it's rather unfortunate
> that it sticks a tuple's Xmax into a field that's called Xmin, but let's
> put that aside).  Maybe there's a problem here, maybe there isn't.

rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples
are frozen before being rewritten. So this looks safe to me at the
end.

> I'm now going to forward-port this to 9.4.

+   /*
+* If the xmax of the old tuple is identical to the xmin of the new one,
+* it's a match.
+*/
+   if (xmax == xmin)
+   return true;
I would use TransactionIdEquals() here, to remember once you switch
that to a macro.

+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
[...]
+   /*
+* We actually don't know if there's a match, but if the previous tuple
+* was frozen, we cannot really rely on a perfect match.
+*/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Wood, Dan
Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.

I would prefer to focus on either latest 9X or 11dev.  Does Alvaro’s patch 
presume any of the other patch to set COMMITTED in the freeze code?


On 10/4/17, 7:17 PM, "Michael Paquier"  wrote:

On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan  wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  
Even with the communities fix plus YiWen’s fix I can still get duplicate rows.  
What works for “in-block” hot chains may not work when spanning blocks.

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is 
waiting to vacuum which one by one will take a while I usually just “pkill -9 
psql”.  After that I have many of duplicate “id=3” rows.  On top of that I 
think we might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Peter Geoghegan
On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera  wrote:
> At any rate, I was thinking in a new routine to encapsulate the logic,
>
> /*
>  * Check the tuple XMIN against prior XMAX, if any
>  */
> if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, 
> HeapTupleHeaderGetXmin(htup)))
> break;
>
> where ..XmaxMatchesXmin would know about checking for possibly frozen
> tuples.

Makes sense.

>> I can see why it
>> would be safe (or at least no more dangerous) to rely on
>> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
>> installations that initdb'd on a version after commit 37484ad2a
>> (version 9.4+). However, I'm not sure why what you propose here would
>> be safe when even raw xmin happens to be FrozenTransactionId. Are you
>> sure that that's truly race-free? If it's really true that we only
>> need to check for FrozenTransactionId on 9.3, why not just do that on
>> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
>> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
>> reasoning.)
>
> I think the RawXmin is the better mechanism.  I'm not absolutely certain
> that the windows is completely closed in 9.3; as I understand things, it
> is possible for transaction A to prune an aborted heap-only tuple, then
> transaction B to insert a frozen tuple in the same location, then
> transaction C follows a link to the HOT that was pruned.  I think we'd
> end up considering that the new frozen tuple is part of the chain, which
> is wrong ...  In 9.4 we can compare the real xmin.

Good point: the race doesn't exist on 9.4+ with pg_upgrade from 9.3,
when raw xmin happens to be FrozenTransactionId, because there can be
no new tuples that have that property. This possible race is strictly
a 9.3 issue. (You have to deal with a raw xmin of FrozenTransactionId
on 9.4+, but there are no race conditions, because affected tuples are
all pre-pg_upgrade tuples -- they were frozen months ago, not
microseconds ago.)

> I hope I am proved wrong about this, because if not, I think we're
> looking at an essentially unsolvable problem in 9.3.

Well, as noted within README.HOT, the xmin/xmax matching did not
appear in earlier versions of the original HOT patch, because it was
thought that holding a pin of the buffer was a sufficient interlock
against concurrent line pointer recycling (pruning must have a buffer
cleanup lock). Clearly it is more robust to match xmin to xmax, but is
there actually any evidence that it was really necessary at all (for
single-page HOT chain traversals) when HOT went in in 2007, or that it
has since become necessary? heap_page_prune()'s caller has to have a
buffer cleanup lock.

I'm certainly not advocating removing the xmin/xmax matching within
heap_prune_chain() on 9.3. However, it may be acceptable to rely on
holding a buffer cleanup lock within heap_prune_chain() on 9.3, just
for the rare/theoretical cases where the FrozenTransactionId raw xmin
ambiguity means that xmin/xmax matching alone might not be enough. As
you say, it seems unsolvable through looking at state on the page
directly on 9.3, so there may be no other way. And, that's still
strictly better than what we have today.

> As far as I understand, this problem only emerges if one part of a HOT
> chain reaches the min freeze age while another part of the same chain is
> still visible by some running transaction.  It is particularly
> noticeably in our current test case because we use a min freeze age of
> 0, with many concurrrent modifying the same page.  What this says to me
> is that VACUUM FREEZE is mildly dangerous when there's lots of high
> concurrent HOT UPDATE activity.

I'm not sure what you mean here. It is dangerous right now, because
there is a bug, but we can squash the bug.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
I think this is the patch for 9.3.  I ran the test a few hundred times
(with some additional changes such as randomly having an update inside a
savepoint that's randomly aborted, randomly aborting the transaction,
randomly skipping the for key share lock, randomly sleeping at a few
points; and also adding filler columns, reducing fillfactor and using
5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
to stay in the same page or migrate to later pages).  The final REINDEX
has never complained again about failing to find the root tuple.  I hope
it's good now.

The attached patch needs a few small tweaks, such as improving
commentary in the new function, maybe turn it into a macro (otherwise I
think it could be bad for performance; I'd like a static func but not
sure those are readily available in 9.3), change the XID comparison to
use the appropriate macro rather than ==, and such.

Regarding changes of xmin/xmax comparison, I also checked manually the
spots I thought should be modified and later double-checked against the
list that Michael posted.  It's a match, except for rewriteheap.c which
I cannot make heads or tails about.  (I think it's rather unfortunate
that it sticks a tuple's Xmax into a field that's called Xmin, but let's
put that aside).  Maybe there's a problem here, maybe there isn't.

I'm now going to forward-port this to 9.4.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..87bce0e7ea 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1718,8 +1718,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation 
relation, Buffer buffer,
 * broken.
 */
if (TransactionIdIsValid(prev_xmax) &&
-   !TransactionIdEquals(prev_xmax,
-
HeapTupleHeaderGetXmin(heapTuple->t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(prev_xmax, 
heapTuple->t_data))
break;
 
/*
@@ -1888,7 +1887,7 @@ heap_get_latest_tid(Relation relation,
 * tuple.  Check for XMIN match.
 */
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(tp.t_data)))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, tp.t_data))
{
UnlockReleaseBuffer(buffer);
break;
@@ -1920,6 +1919,36 @@ heap_get_latest_tid(Relation relation,
}   /* end of loop 
*/
 }
 
+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
+bool
+HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
+{
+   TransactionId   xmin = HeapTupleHeaderGetXmin(htup);
+
+   /* xmax must not be invalid or frozen */
+   Assert(TransactionIdIsNormal(xmax));
+
+   /*
+* If the xmax of the old tuple is identical to the xmin of the new one,
+* it's a match.
+*/
+   if (xmax == xmin)
+   return true;
+
+   /* FIXME Here we need to check the HEAP_XMIN_FROZEN in 9.4 and up */
+
+   /*
+* We actually don't know if there's a match, but if the previous tuple
+* was frozen, we cannot really rely on a perfect match.
+*/
+   if (xmin == FrozenTransactionId)
+   return true;
+
+   return false;
+}
 
 /*
  * UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends
@@ -5045,8 +5074,7 @@ l4:
 * the end of the chain, we're done, so return success.
 */
if (TransactionIdIsValid(priorXmax) &&
-   
!TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
-priorXmax))
+   !HeapTupleUpdateXmaxMatchesXmin(priorXmax, 
mytup.t_data))
{
UnlockReleaseBuffer(buf);
return HeapTupleMayBeUpdated;
@@ -5500,7 +5528,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+   {
+   xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | 
FRM_RETURN_IS_XID;
+   }
else
{
*flags |= FRM_INVALIDATE_XMAX;
diff --git 

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
Peter Geoghegan wrote:

> As you know, on version 9.4+, as of commit 37484ad2a, we decided that
> we are "largely ignoring the value to which it [xmin] is set". The
> expectation became that raw xmin is available after freezing, but
> mostly for forensic purposes. I think Alvaro should now memorialize
> the idea that its value is actually critical in some place
> (htup_details.h?).

I'd love to be certain that we preserve the original value in all
cases.

> On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera  
> wrote:

> > I independently arrived at the same conclusion.  Since I was trying with
> > 9.3, the patch differs -- in the old version we must explicitely test
> > for the FrozenTransactionId value, instead of using GetRawXmin.
> 
> Obviously you're going to have to be prepared for a raw xmin of
> FrozenTransactionId, even on 9.4+, due to pg_upgrade.

Hmm.  It would be good to be able to get rid of that case, actually,
because I now think it's less safe (see below).

At any rate, I was thinking in a new routine to encapsulate the logic,

/*
 * Check the tuple XMIN against prior XMAX, if any
 */
if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
break;

where ..XmaxMatchesXmin would know about checking for possibly frozen
tuples.

> I can see why it
> would be safe (or at least no more dangerous) to rely on
> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
> installations that initdb'd on a version after commit 37484ad2a
> (version 9.4+). However, I'm not sure why what you propose here would
> be safe when even raw xmin happens to be FrozenTransactionId. Are you
> sure that that's truly race-free? If it's really true that we only
> need to check for FrozenTransactionId on 9.3, why not just do that on
> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
> reasoning.)

I think the RawXmin is the better mechanism.  I'm not absolutely certain
that the windows is completely closed in 9.3; as I understand things, it
is possible for transaction A to prune an aborted heap-only tuple, then
transaction B to insert a frozen tuple in the same location, then
transaction C follows a link to the HOT that was pruned.  I think we'd
end up considering that the new frozen tuple is part of the chain, which
is wrong ...  In 9.4 we can compare the real xmin.

I hope I am proved wrong about this, because if not, I think we're
looking at an essentially unsolvable problem in 9.3.

> Obviously any race would have a ridiculously tiny window, but it's not
> obvious why this protocol would be completely race-free (in the event
> of a FrozenTransactionId raw xmin).

As far as I understand, this problem only emerges if one part of a HOT
chain reaches the min freeze age while another part of the same chain is
still visible by some running transaction.  It is particularly
noticeably in our current test case because we use a min freeze age of
0, with many concurrrent modifying the same page.  What this says to me
is that VACUUM FREEZE is mildly dangerous when there's lots of high
concurrent HOT UPDATE activity.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Alvaro Herrera
Wood, Dan wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  Even 
> with the communities fix plus YiWen’s fix I can still get duplicate rows.  
> What works for “in-block” hot chains may not work when spanning blocks.

Good idea.  You can achieve a similar effect by adding a filler column,
and reducing fillfactor.

> Once nearly all 250 clients have done their updates and everybody is
> waiting to vacuum which one by one will take a while I usually just
> “pkill -9 psql”.  After that I have many of duplicate “id=3” rows.

Odd ...

> On top of that I think we might have a lock leak.  After the pkill I
> tried to rerun setup.sql to drop/create the table and it hangs.  I see
> an autovacuum process starting and existing every couple of seconds.
> Only by killing and restarting PG can I drop the table.

Please do try to figure this one out.  It'd be a separate problem,
worthy of its own thread.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan  wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  Even 
> with the communities fix plus YiWen’s fix I can still get duplicate rows.  
> What works for “in-block” hot chains may not work when spanning blocks.

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is waiting 
> to vacuum which one by one will take a while I usually just “pkill -9 psql”.  
> After that I have many of duplicate “id=3” rows.  On top of that I think we 
> might have a lock leak.  After the pkill I tried to rerun setup.sql to 
> drop/create the table and it hangs.  I see an autovacuum process starting and 
> existing every couple of seconds.  Only by killing and restarting PG can I 
> drop the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Wood, Dan
Whatever you do make sure to also test 250 clients running lock.sql.  Even with 
the communities fix plus YiWen’s fix I can still get duplicate rows.  What 
works for “in-block” hot chains may not work when spanning blocks.

Once nearly all 250 clients have done their updates and everybody is waiting to 
vacuum which one by one will take a while I usually just “pkill -9 psql”.  
After that I have many of duplicate “id=3” rows.  On top of that I think we 
might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.

On 10/4/17, 6:31 PM, "Michael Paquier"  wrote:

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera  
wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain 
is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
> /*
>  * If xmin isn't what we're expecting, the slot 
must have been
>  * recycled and reused for an unrelated tuple.  
This implies that
>  * the latest version of the row was deleted, so 
we need do
>  * nothing.  (Should be safe to examine xmin 
without getting
>  * buffer's content lock, since xmin never 
changes in an existing
>  * tuple.)
>  */
> if 
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>  
priorXmax))

Agreed. That's not good.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Michael Paquier
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera  wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is 
>> continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
>> > HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
>> FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
> /*
>  * If xmin isn't what we're expecting, the slot must 
> have been
>  * recycled and reused for an unrelated tuple.  This 
> implies that
>  * the latest version of the row was deleted, so we 
> need do
>  * nothing.  (Should be safe to examine xmin without 
> getting
>  * buffer's content lock, since xmin never changes in 
> an existing
>  * tuple.)
>  */
> if 
> (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>  
> priorXmax))

Agreed. That's not good.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera  wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is 
>> continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
>> > HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
>> FreezeId instead of the Xmin when the transaction happened

As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).

> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.

Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)

Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan  wrote:
> The early “break;” here is likely the xmin frozen reason as I found in the 
> other loop.

It looks that way.

Since we're already very defensive when it comes to this xmin/xmax
matching business, and we're defensive while following an update chain
more generally, I wonder if we should be checking
HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON
CONFLICT DO UPDATE, where t_ctid block number might actually be a
speculative insertion token). Or, at least acknowledging that case in
comments. I remember expressing concern that something like this was
possible at the time that that went in.

We certainly don't want to have heap_abort_speculative() "super
delete" the wrong tuple in the event of item pointer recycling. There
are at least defensive sanity checks that would turn that into an
error within heap_abort_speculative(), so it wouldn't be a disaster if
it was attempted. I don't think that it's possible in practice, and
maybe it's sufficient that routines like heap_get_latest_tid() check
for a sane item offset, which will discriminate against
SpecTokenOffsetNumber, which must be well out of range for ItemIds on
the page. Could be worth a comment.

(I guess that heap_prune_chain() wouldn't need to be changed if we
decide to add such comments, because the speculative tuple ItemId is
going to be skipped over due to being ItemIdIsUsed() before we even
get there.)
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen  wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is 
> continuous; in which case the condition should be:
>
>> if (TransactionIdIsValid(priorXmax) &&
>> !TransactionIdEquals(priorXmax, 
>> HeapTupleHeaderGetRawXmin(htup)))
>> break;
>
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
> FreezeId instead of the Xmin when the transaction happened

I was thinking along similar lines.

> The interesting consequence of changing that is the prune seems to get the 
> entire chain altogether with Dan's repro... I've run it a couple of times and 
> have consistently gotten the following page
>
>  lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
> +++--++-
>   1 | (0,1)  |   8152 |1 |   2818 |   3
>   2 ||  7 |2 ||
>   3 ||  0 |0 ||
>   4 ||  0 |0 ||
>   5 ||  0 |0 ||
>   6 ||  0 |0 ||
>   7 | (0,7)  |   8112 |1 |  11010 |   32771
> (7 rows)

That's also what I see. This is a good thing, I think; that's how we
ought to prune.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wong, Yi Wen wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is 
> continuous; in which case the condition should be:
> 
> > if (TransactionIdIsValid(priorXmax) &&
> > !TransactionIdEquals(priorXmax, 
> > HeapTupleHeaderGetRawXmin(htup)))
> > break;
> 
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
> FreezeId instead of the Xmin when the transaction happened

I independently arrived at the same conclusion.  Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.

This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.


This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes?  It does change with freezing.

/*
 * If xmin isn't what we're expecting, the slot must 
have been
 * recycled and reused for an unrelated tuple.  This 
implies that
 * the latest version of the row was deleted, so we 
need do
 * nothing.  (Should be safe to examine xmin without 
getting
 * buffer's content lock, since xmin never changes in 
an existing
 * tuple.)
 */
if 
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
 
priorXmax))


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+   {
+   xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | 
FRM_RETURN_IS_XID;
+   }
else
{
*flags |= FRM_INVALIDATE_XMAX;
diff --git a/src/backend/access/heap/pruneheap.c 
b/src/backend/access/heap/pruneheap.c
index 9a8db74cb9..561acd144a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -435,7 +435,8 @@ heap_prune_chain(Relation relation, Buffer buffer, 
OffsetNumber rootoffnum,
 * Check the tuple XMIN against prior XMAX, if any
 */
if (TransactionIdIsValid(priorXmax) &&
-   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax))
+   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax) &&
+   HeapTupleHeaderGetXmin(htup) != FrozenTransactionId)
break;
 
/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> 
> > I thought that we no longer store FrozenTransactionId (xid 2) as our
> > "raw" xmin while freezing, and yet that's what we see here.
> 
> I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
> broke things more thoroughly, but it is already broken in earlier
> releases.

In fact, I think in 9.3 we should include this patch, to set the Xmin to
FrozenXid.  9.4 and onwards have commit 37484ad2a "Change the way we
mark tuples as frozen" which uses a combination of infomask bits, but in
9.3 I think leaving the unfrozen value in the xmax field is a bad idea
even if we set the HEAP_XMAX_COMMITTED bit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+   {
+   xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | 
FRM_RETURN_IS_XID;
+   }
else
{
*flags |= FRM_INVALIDATE_XMAX;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wood, Dan wrote:
> One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row 
> chain?
> 
> lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
> ++++-++
>   1 | (0,1)  |   8152 |   2818 |   3 |  36957 |  0
>   2 ||  5 || ||   
>   3 ||  0 || ||   
>   4 ||  0 || ||   
>   5 | (0,6)  |   8112 |   9986 |   49155 |  36962 |  36963
>   6 | (0,7)  |   8072 |   9986 |   49155 |  36963 |  36961
>   7 | (0,7)  |   8032 |  11010 |   32771 |  36961 |  0
> (7 rows)

No, it just means transaction A got its XID before transaction B, but B
executed the update first and A updated the tuple second.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Wood, Dan wrote:

> There is a tangled web of issues here.  With the community fix we get a 
> corrupted page(invalid redirect ptr from indexed item).  The cause of that is:
> pruneheap.c:
> 
>   /*
>* Check the tuple XMIN against prior XMAX, if any
>*/
>   if (TransactionIdIsValid(priorXmax) &&
>   !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
> priorXmax))
>   break;
> 
>   chainitems[nchain++] = offnum;
> 
> The priorXmax is a multixact key share lock,

Uhh, what?  That certainly shouldn't happen -- the priorXmax comes from

priorXmax = HeapTupleHeaderGetUpdateXid(htup);

so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Alvaro Herrera
Peter Geoghegan wrote:

> I thought that we no longer store FrozenTransactionId (xid 2) as our
> "raw" xmin while freezing, and yet that's what we see here.

I'm doing this in 9.3.  I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row 
chain?

lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
++++-++
  1 | (0,1)  |   8152 |   2818 |   3 |  36957 |  0
  2 ||  5 || ||   
  3 ||  0 || ||   
  4 ||  0 || ||   
  5 | (0,6)  |   8112 |   9986 |   49155 |  36962 |  36963
  6 | (0,7)  |   8072 |   9986 |   49155 |  36963 |  36961
  7 | (0,7)  |   8032 |  11010 |   32771 |  36961 |  0
(7 rows)


On 10/3/17, 6:20 PM, "Peter Geoghegan"  wrote:

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan  wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist.  Can we 
differentiate the cases they are protecting against with the two exceptions 
I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


-- 
Peter Geoghegan




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan  wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, 
> HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist.  Can we 
> differentiate the cases they are protecting against with the two exceptions 
> I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier
 wrote:
>> FYI, the repro case page contents looks like this with the patch applied:
>> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
>> to_hex(t_infomask) as infomask,
>> to_hex(t_infomask2) as infomask2
>> from heap_page_items(get_raw_page('t', 0));
>>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
>> +--+-+++--+---
>>   1 |1 | 1845995 |  0 | (0,1)  | b02  | 3
>>   2 |2 | |||  |
>>   3 |0 | |||  |
>>   4 |0 | |||  |
>>   5 |0 | |||  |
>>   6 |0 | |||  |
>>   7 |1 | 1846001 |  0 | (0,7)  | 2b02 | 8003
>> (7 rows)
>
> Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
> what would look correct to me.

Yes, it is:

postgres=# select * from bt_page_items('foo', 1);
 itemoffset | ctid  | itemlen | nulls | vars |  data
+---+-+---+--+-
  1 | (0,1) |  16 | f | f| 01 00 00 00 00 00 00 00
  2 | (0,2) |  16 | f | f| 03 00 00 00 00 00 00 00
(2 rows)

I can tell from looking at my hex editor that the 4 bytes of ItemId
that we see for position '(0,2)' in the ItemId array are "07 00 01
00", meaning that '(0,2)' this is a LP_REDIRECT item, repointing us to
'(0,7)'. Everything here looks sane to me, at least at first blush.

> - * Check the tuple XMIN against prior XMAX, if any
> - */
> -if (TransactionIdIsValid(priorXmax) &&
> -!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
> -break;
> If you remove this check, you could also remove completely priorXmax.
>
> Actually, I may be missing something, but why is priorXmax updated
> even for dead tuples? For example just doing that is also taking care
> of the problem:

I'll study what you suggest here some more tomorrow.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here.  With the community fix we get a 
corrupted page(invalid redirect ptr from indexed item).  The cause of that is:
pruneheap.c:

  /*
   * Check the tuple XMIN against prior XMAX, if any
   */
  if (TransactionIdIsValid(priorXmax) &&
  !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax))
  break;

chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin.  This 
terminates the scan.  Unlike the scan termination with “if (!recent_dead) 
break;” below the switch (...SatisiesVacuum…), this “break” does not put the 
offnum into the chain even though it is in the chain.  If the first not-deleted 
item isn’t put in the chain then we’ll not call heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting.  Normally the xmin is 
equal to the priorXmax.  Other than protecting against corruption a key share 
lock can cause this.  So, I tried a fix which does the “if” check after adding 
it to chainitems.  This will break whatever real situation this IF was 
protecting.  Tom Lane put this in.

OK:  With that hack of a fix the redirect now works correctly.  However, we 
still get the reindex problem with not finding the parent.  That problem is 
caused by:
Pruneheap.c:heap_get_root_tuples()

if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
break;

In this case, instead of these not being equal because of a multixact key share 
lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the 
priorXmax.  Thus, we don’t build the entire chain from root to most current row 
version and this causes the reindex failure.

If we disable this ‘if’ as a hack then we no longer get a problem on the 
reindex.  However, YiWen reported that at the end of an install check out index 
checking reported corruption in the system catalogues.  So we are still looking.

We need to understand why these TXID equal checks exist.  Can we differentiate 
the cases they are protecting against with the two exceptions I’ve found?

FYI, someone should look at the same ”if”  test in heapam.c: 
heap_lock_updated_tuple_rec().  Also, I hope there are no strange issues with 
concurrent index builds.

Finally, the idea behind the original fix was to simply NOT to do an 
unsupported freeze on a dead tuple.  It had two drawbacks:
1) CLOG truncation.  This could have been handled by keeping track of the old 
unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress.   The only reason the prune would fail should be 
because of an open TXN.  Unless that TXN was so old such that it’s XID was as 
old as the ?min freeze threshold? then we would make progress.  If we were 
doing TXN’s that old then we’d be having problems anyway.


On 10/3/17, 5:15 PM, "Michael Paquier"  wrote:

On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan  wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
> +--+-+++--+---
>   1 |1 | 1845995 |  0 | (0,1)  | b02  | 3
>   2 |2 | |||  |
>   3 |0 | |||  |
>   4 |  

Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Michael Paquier
On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan  wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
> +--+-+++--+---
>   1 |1 | 1845995 |  0 | (0,1)  | b02  | 3
>   2 |2 | |||  |
>   3 |0 | |||  |
>   4 |0 | |||  |
>   5 |0 | |||  |
>   6 |0 | |||  |
>   7 |1 | 1846001 |  0 | (0,7)  | 2b02 | 8003
> (7 rows)

Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.

- * Check the tuple XMIN against prior XMAX, if any
- */
-if (TransactionIdIsValid(priorXmax) &&
-!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
-break;
If you remove this check, you could also remove completely priorXmax.

Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(>t_ctid) ==
   BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(>t_ctid);
-   priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+   if (!tupdead)
+   priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera  wrote:
> But that still doesn't fix the problem;
> as far as I can see, vacuum removes the root of the chain, not yet sure
> why, and then things are just as corrupted as before.

Are you sure it's not opportunistic pruning? Another thing that I've
noticed with this problem is that the relevant IndexTuple will pretty
quickly vanish, presumably due to LP_DEAD setting (but maybe not
actually due to LP_DEAD setting).

(Studies the problem some more...)

I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)

The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.

FYI, the repro case page contents looks like this with the patch applied:

postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
+--+-+++--+---
  1 |1 | 1845995 |  0 | (0,1)  | b02  | 3
  2 |2 | |||  |
  3 |0 | |||  |
  4 |0 | |||  |
  5 |0 | |||  |
  6 |0 | |||  |
  7 |1 | 1846001 |  0 | (0,7)  | 2b02 | 8003
(7 rows)

-- 
Peter Geoghegan
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52231ac..90eb39e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -470,13 +470,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
 		ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
 
 		/*
-		 * Check the tuple XMIN against prior XMAX, if any
-		 */
-		if (TransactionIdIsValid(priorXmax) &&
-			!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
-			break;
-
-		/*
 		 * OK, this tuple is indeed a member of the chain.
 		 */
 		chainitems[nchain++] = offnum;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Peter Geoghegan
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera  wrote:
> which shows a HOT-update chain, where the t_xmax are multixacts.  Then a
> vacuum freeze comes, and because the multixacts are below the freeze
> horizon for multixacts, we get this:
>
> select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> +--++++--+---
>   1 |1 |  2 |  0 | (0,1)  | 902  | 3
>   2 |0 ||||  |
>   3 |1 |  2 |  14662 | (0,4)  | 2502 | c003
>   4 |1 |  2 |  14663 | (0,5)  | 2502 | c003
>   5 |1 |  2 |  14664 | (0,6)  | 2502 | c003
>   6 |1 |  2 |  14665 | (0,7)  | 2502 | c003
>   7 |1 |  2 |  0 | (0,7)  | 2902 | 8003
> (7 filas)
>
> where the xmin values have all been frozen, and the xmax values are now
> regular Xids.

I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here. It looks
like pageinspect is looking at raw xmin (it's calling
HeapTupleHeaderGetRawXmin(), not HeapTupleHeaderGetXmin()). What's the
deal with that? Shouldn't the original xmin be preserved for forensic
analysis, following commit 37484ad2?

I guess what you mean is that this is what you see having modified the
code to actually store FrozenTransactionId as xmin once more, in an
effort to fix this?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Alvaro Herrera
So here's my attempt at an explanation for what is going on.  At one
point, we have this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
+--++++--+---
  1 |1 |  2 |  0 | (0,1)  | 902  | 3
  2 |0 ||||  | 
  3 |1 |  2 |  19928 | (0,4)  | 3142 | c003
  4 |1 |  14662 |  19929 | (0,5)  | 3142 | c003
  5 |1 |  14663 |  19931 | (0,6)  | 3142 | c003
  6 |1 |  14664 |  19933 | (0,7)  | 3142 | c003
  7 |1 |  14665 |  0 | (0,7)  | 2902 | 8003
(7 filas)

which shows a HOT-update chain, where the t_xmax are multixacts.  Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
+--++++--+---
  1 |1 |  2 |  0 | (0,1)  | 902  | 3
  2 |0 ||||  | 
  3 |1 |  2 |  14662 | (0,4)  | 2502 | c003
  4 |1 |  2 |  14663 | (0,5)  | 2502 | c003
  5 |1 |  2 |  14664 | (0,6)  | 2502 | c003
  6 |1 |  2 |  14665 | (0,7)  | 2502 | c003
  7 |1 |  2 |  0 | (0,7)  | 2902 | 8003
(7 filas)

where the xmin values have all been frozen, and the xmax values are now
regular Xids.  I think the HOT code that walks the chain fails to detect
these as chains, because the xmin values no longer match the xmax
values.  I modified the multixact freeze code, so that whenever the
update Xid is below the cutoff Xid, it's set to FrozenTransactionId,
since keeping the other value is invalid anyway (even though we have set
the HEAP_XMAX_COMMITTED flag).  But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-01 Thread Alvaro Herrera
Michael Paquier wrote:
> On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan  wrote:
> > On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera  
> > wrote:
> >> Maybe what this means is that we need to do both Dan's initially
> >> proposed patch (or something related to it) apart from the fixes already
> >> pushed.  IOW we need to put back some of the "tupkeep" business ...
> >
> > I took the time to specifically check if that would fix the problem.
> > Unfortunately, it did not. We see exactly the same problem, or at
> > least amcheck/REINDEX produces exactly the same error. I checked both
> > Dan's original update_freeze.patch, and your revision that retained
> > some of the "tupkeep" stuff,
> > 0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
> > September 6th.
> 
> I did not take the time to dig into that more than two hours, but my
> first feeling is that some race condition is going on with the heap
> pruning.

I'll look into this on Monday.  I found out yesterday that the
problematic case is when HTSV returns HEAPTUPLE_DEAD and the HOT tests
return true.  I haven't yet figured if it is one of those specifically,
but I suspect it is the IsHotUpdated case.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-01 Thread Michael Paquier
On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan  wrote:
> On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera  
> wrote:
>> Maybe what this means is that we need to do both Dan's initially
>> proposed patch (or something related to it) apart from the fixes already
>> pushed.  IOW we need to put back some of the "tupkeep" business ...
>
> I took the time to specifically check if that would fix the problem.
> Unfortunately, it did not. We see exactly the same problem, or at
> least amcheck/REINDEX produces exactly the same error. I checked both
> Dan's original update_freeze.patch, and your revision that retained
> some of the "tupkeep" stuff,
> 0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
> September 6th.

I did not take the time to dig into that more than two hours, but my
first feeling is that some race condition is going on with the heap
pruning.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-30 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera  wrote:
> Maybe what this means is that we need to do both Dan's initially
> proposed patch (or something related to it) apart from the fixes already
> pushed.  IOW we need to put back some of the "tupkeep" business ...

I took the time to specifically check if that would fix the problem.
Unfortunately, it did not. We see exactly the same problem, or at
least amcheck/REINDEX produces exactly the same error. I checked both
Dan's original update_freeze.patch, and your revision that retained
some of the "tupkeep" stuff,
0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
September 6th.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-29 Thread Michael Paquier
On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera  wrote:
> Peter Geoghegan wrote:
>
>> We certainly do still see wrong answers to queries here:
>>
>> postgres=# select ctid, xmin, xmax, * from t;
>>  ctid  | xmin  | xmax | id | name | x
>> ---+---+--++--+---
>>  (0,1) | 21171 |0 |  1 | 111  | 0
>>  (0,7) | 21177 |0 |  3 | 333  | 5
>> (2 rows)
>>
>> postgres=# select * from t where id = 3;
>>  id | name | x
>> +--+---
>>   3 | 333  | 5
>> (1 row)
>>
>> postgres=# set enable_seqscan = off;
>> SET
>> postgres=# select * from t where id = 3;
>>  id | name | x
>> +--+---
>> (0 rows)
>
> Yeah, oops.

This really looks like a problem at heap-level with the parent
redirection not getting defined (?). Please note that a subsequent
REINDEX fails as well:
=# reindex index t_pkey ;
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597

VACUUM FREEZE also is not getting things right, but a VACUUM FULL does.

Also, dropping the constraint and attempting to recreate it is failing:
=# alter table t drop constraint t_pkey;
ALTER TABLE
=# create index t_pkey on t(id);
ERROR:  XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION:  IndexBuildHeapRangeScan, index.c:2597

A corrupted page clearly indicates that there are no tuple redirections:
=# select lp, t_ctid, lp_off, t_infomask, t_infomask2 from
heap_page_items(get_raw_page('t', 0));
 lp | t_ctid | lp_off | t_infomask | t_infomask2
++++-
  1 | (0,1)  |   8152 |   2818 |   3
  2 | null   |  0 |   null |null
  3 | (0,4)  |   8112 |   9986 |   49155
  4 | (0,5)  |   8072 |   9986 |   49155
  5 | (0,6)  |   8032 |   9986 |   49155
  6 | (0,7)  |   7992 |   9986 |   49155
  7 | (0,7)  |   7952 |  11010 |   32771
(7 rows)

And a non-corrupted page clearly shows the redirection done with lp_off:
 lp | t_ctid | lp_off | t_infomask | t_infomask2
++++-
  1 | (0,1)  |   8152 |   2818 |   3
  2 | null   |  7 |   null |null
  3 | null   |  0 |   null |null
  4 | null   |  0 |   null |null
  5 | null   |  0 |   null |null
  6 | null   |  0 |   null |null
  7 | (0,7)  |   8112 |  11010 |   32771
(7 rows)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 3:20 PM, Robert Haas  wrote:
> On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan  wrote:
>> In the end, commit 6bfa88a fixed that old recovery bug by making sure
>> the recovery routine heap_xlog_lock() did the right thing. In both
>> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
>> pointed to the root of a HOT chain when it should point to some child
>> tuple (or maybe a successor HOT chain).
>
> Unless I'm very confused, it's really not OK to point at a child tuple
> rather than the root of the HOT chain.

Please see my later clarification.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Robert Haas
On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan  wrote:
> In the end, commit 6bfa88a fixed that old recovery bug by making sure
> the recovery routine heap_xlog_lock() did the right thing. In both
> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
> pointed to the root of a HOT chain when it should point to some child
> tuple (or maybe a successor HOT chain).

Unless I'm very confused, it's really not OK to point at a child tuple
rather than the root of the HOT chain.

Pointing to a successor HOT chain would be OK, as long as you point to
the root tuple thereof.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 2:47 PM, Peter Geoghegan  wrote:
> In the end, commit 6bfa88a fixed that old recovery bug by making sure
> the recovery routine heap_xlog_lock() did the right thing. In both
> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
> pointed to the root of a HOT chain when it should point to some child
> tuple (or maybe a successor HOT chain).

Just to be clear, obviously I don't mean that the index should have
been magically updated to compensate for that 2014 heap_lock_tuple()
recovery bug (say, via an in-place IndexTuple update during recovery).
Certainly, the index AM was entitled to assume that the heap TID it
pointed to would continue to be the root of the same HOT chain, at
least until the next VACUUM. That was what I meant by "the index
wasn't actually corrupt".

All I meant here is that the index scan and sequential scan would have
been in agreement if something like that *had* happened artificially
(say, when the index tuple was edited with a hex editor). And, that
the actual high level problem was that we failed to treat a
HOT-updated tuple as a HOT-updated tuple, leading to consequences that
were mostly noticeable during index scans. Probably on both occasions
(back in 2014, and now).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 2:39 PM, Alvaro Herrera  wrote:
>> FWIW, I am reminded a little bit of the MultiXact/recovery bug I
>> reported way back in February of 2014 [1], which also had a HOT
>> interaction that caused index scans to give wrong answers, despite
>> more or less structurally sound indexes.
>>
>> [1] 
>> https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com
>
> Thanks for the reference.  I didn't remember this problem and it's not
> (wasn't) in my list of things to look into.  Perhaps these are both the
> same bug.

I was reminded of that old bug because initially, at the time, it
looked very much like a corrupt index: sequential scans were fine, but
index scans gave wrong answers. This is what I saw today.

In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Peter Geoghegan wrote:

> We certainly do still see wrong answers to queries here:
> 
> postgres=# select ctid, xmin, xmax, * from t;
>  ctid  | xmin  | xmax | id | name | x
> ---+---+--++--+---
>  (0,1) | 21171 |0 |  1 | 111  | 0
>  (0,7) | 21177 |0 |  3 | 333  | 5
> (2 rows)
> 
> postgres=# select * from t where id = 3;
>  id | name | x
> +--+---
>   3 | 333  | 5
> (1 row)
> 
> postgres=# set enable_seqscan = off;
> SET
> postgres=# select * from t where id = 3;
>  id | name | x
> +--+---
> (0 rows)

Yeah, oops.

> FWIW, I am reminded a little bit of the MultiXact/recovery bug I
> reported way back in February of 2014 [1], which also had a HOT
> interaction that caused index scans to give wrong answers, despite
> more or less structurally sound indexes.
> 
> [1] 
> https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com

Thanks for the reference.  I didn't remember this problem and it's not
(wasn't) in my list of things to look into.  Perhaps these are both the
same bug.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Peter Geoghegan
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>
>> Odd that it's not fixed.  I guess there's still some more work to do
>> here ...
>
> Maybe what this means is that we need to do both Dan's initially
> proposed patch (or something related to it) apart from the fixes already
> pushed.  IOW we need to put back some of the "tupkeep" business ...

We certainly do still see wrong answers to queries here:

postgres=# select ctid, xmin, xmax, * from t;
 ctid  | xmin  | xmax | id | name | x
---+---+--++--+---
 (0,1) | 21171 |0 |  1 | 111  | 0
 (0,7) | 21177 |0 |  3 | 333  | 5
(2 rows)

postgres=# select * from t where id = 3;
 id | name | x
+--+---
  3 | 333  | 5
(1 row)

postgres=# set enable_seqscan = off;
SET
postgres=# select * from t where id = 3;
 id | name | x
+--+---
(0 rows)

FWIW, I am reminded a little bit of the MultiXact/recovery bug I
reported way back in February of 2014 [1], which also had a HOT
interaction that caused index scans to give wrong answers, despite
more or less structurally sound indexes.

[1] 
https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=o67w0csswpvv7xfuc...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Alvaro Herrera wrote:

> Odd that it's not fixed.  I guess there's still some more work to do
> here ...

Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed.  IOW we need to put back some of the "tupkeep" business ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-09-28 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera  
> wrote:
> > Fix freezing of a dead HOT-updated tuple
> 
> If I run Dan Wood's test case again, the obvious symptom (spurious
> duplicates) goes away. However, the enhanced amcheck, and thus CREATE
> INDEX/REINDEX, still isn't happy about this:
> 
> postgres=# select bt_index_check('t_pkey', true);
> DEBUG:  0: verifying presence of required tuples in index "t_pkey"
> LOCATION:  bt_check_every_level, verify_nbtree.c:424
> ERROR:  XX000: failed to find parent tuple for heap-only tuple at
> (0,6) in table "t"
> LOCATION:  IndexBuildHeapRangeScan, index.c:2597
> Time: 3.699 ms

... Rats, I obviously missed the message where you said that amcheck
detected this problem.

Odd that it's not fixed.  I guess there's still some more work to do
here ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers