Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-10 Thread Wood, Dan
I found one glitch with our merge of the original dup row fix.  With that 
corrected AND Alvaro’s Friday fix things are solid.
No dup’s.  No index corruption.

Thanks so much. 

On 10/10/17, 7:25 PM, "Michael Paquier"  wrote:

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
 wrote:
> I was seeing just the reindex problem.  I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened.  Maybe I made a mistake last week and
> ran an unfixed version.  I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update.  That lock is acquired by EvalPlanQual locking the row just
> before doing the update.  I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false.  This ends up aborting
>> the multixact on the row even though the real xid is committed.  This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-09 Thread Wood, Dan
I’m unclear on what is being repro’d in 9.6.  Are you getting the duplicate 
rows problem or just the reindex problem?  Are you testing with asserts 
enabled(I’m not)?

If you are getting the dup rows consider the code in the block in heapam.c that 
starts with the comment “replace multi by update xid”.
When I repro this I find that MultiXactIdGetUpdateXid() returns 0.  There is an 
updater in the multixact array however the status is 
MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate.  I assume 
this is a preliminary status before the following row in the hot chain has it’s 
multixact set to NoKeyUpdate.

Since a 0 is returned this does precede cutoff_xid and 
TransactionIdDidCommit(0) will return false.  This ends up aborting the 
multixact on the row even though the real xid is committed.  This sets XMAX to 
0 and that row becomes visible as one of the dups.  Interestingly the real xid 
of the updater is 122944 and the cutoff_xid is 122945.

I’m still debugging but I start late so I’m passing this incomplete info along 
now.

On 10/7/17, 4:25 PM, "Alvaro Herrera"  wrote:

Peter Geoghegan wrote:
> On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera  
wrote:
> >> As you must have seen, Alvaro said he has a variant of Dan's original
> >> script that demonstrates that a problem remains, at least on 9.6+,
> >> even with today's fix. I think it's the stress-test that plays with
> >> fillfactor, many clients, etc [1].
> >
> > I just execute setup.sql once and then run this shell command,
> >
> > while :; do
> > psql -e -P pager=off -f ./repro.sql
> > for i in `seq 1 5`; do
> > psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> > done
> > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> > psql -P pager=off -e --no-psqlrc -f ./report.sql
> > echo "done"
> > done
> 
> I cannot reproduce the problem on my personal machine using this
> script/stress-test. I tried to do so on the master branch git tip.
> This reinforces the theory that there is some timing sensitivity,
> because the remaining race condition is very narrow.

Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-05 Thread Wood, Dan
Yes, I’ve been testing 9.6.  I’ll try Alvaro’s patch today.

I would prefer to focus on either latest 9X or 11dev.  Does Alvaro’s patch 
presume any of the other patch to set COMMITTED in the freeze code?


On 10/4/17, 7:17 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote:

On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexp...@amazon.com> wrote:
> Whatever you do make sure to also test 250 clients running lock.sql.  
Even with the communities fix plus YiWen’s fix I can still get duplicate rows.  
What works for “in-block” hot chains may not work when spanning blocks.

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is 
waiting to vacuum which one by one will take a while I usually just “pkill -9 
psql”.  After that I have many of duplicate “id=3” rows.  On top of that I 
think we might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-04 Thread Wood, Dan
Whatever you do make sure to also test 250 clients running lock.sql.  Even with 
the communities fix plus YiWen’s fix I can still get duplicate rows.  What 
works for “in-block” hot chains may not work when spanning blocks.

Once nearly all 250 clients have done their updates and everybody is waiting to 
vacuum which one by one will take a while I usually just “pkill -9 psql”.  
After that I have many of duplicate “id=3” rows.  On top of that I think we 
might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.

On 10/4/17, 6:31 PM, "Michael Paquier"  wrote:

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera  
wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain 
is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion.  Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes?  It does change with freezing.
>
> /*
>  * If xmin isn't what we're expecting, the slot 
must have been
>  * recycled and reused for an unrelated tuple.  
This implies that
>  * the latest version of the row was deleted, so 
we need do
>  * nothing.  (Should be safe to examine xmin 
without getting
>  * buffer's content lock, since xmin never 
changes in an existing
>  * tuple.)
>  */
> if 
(!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
>  
priorXmax))

Agreed. That's not good.
-- 
Michael




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
One minor side note…   Is it weird for xmin/xmax to go backwards in a hot row 
chain?

lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax 
++++-++
  1 | (0,1)  |   8152 |   2818 |   3 |  36957 |  0
  2 ||  5 || ||   
  3 ||  0 || ||   
  4 ||  0 || ||   
  5 | (0,6)  |   8112 |   9986 |   49155 |  36962 |  36963
  6 | (0,7)  |   8072 |   9986 |   49155 |  36963 |  36961
  7 | (0,7)  |   8032 |  11010 |   32771 |  36961 |  0
(7 rows)


On 10/3/17, 6:20 PM, "Peter Geoghegan" <p...@bowt.ie> wrote:

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexp...@amazon.com> wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist.  Can we 
differentiate the cases they are protecting against with the two exceptions 
I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".


-- 
Peter Geoghegan




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-10-03 Thread Wood, Dan
I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here.  With the community fix we get a 
corrupted page(invalid redirect ptr from indexed item).  The cause of that is:
pruneheap.c:

  /*
   * Check the tuple XMIN against prior XMAX, if any
   */
  if (TransactionIdIsValid(priorXmax) &&
  !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), 
priorXmax))
  break;

chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin.  This 
terminates the scan.  Unlike the scan termination with “if (!recent_dead) 
break;” below the switch (...SatisiesVacuum…), this “break” does not put the 
offnum into the chain even though it is in the chain.  If the first not-deleted 
item isn’t put in the chain then we’ll not call heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting.  Normally the xmin is 
equal to the priorXmax.  Other than protecting against corruption a key share 
lock can cause this.  So, I tried a fix which does the “if” check after adding 
it to chainitems.  This will break whatever real situation this IF was 
protecting.  Tom Lane put this in.

OK:  With that hack of a fix the redirect now works correctly.  However, we 
still get the reindex problem with not finding the parent.  That problem is 
caused by:
Pruneheap.c:heap_get_root_tuples()

if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, 
HeapTupleHeaderGetXmin(htup)))
break;

In this case, instead of these not being equal because of a multixact key share 
lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the 
priorXmax.  Thus, we don’t build the entire chain from root to most current row 
version and this causes the reindex failure.

If we disable this ‘if’ as a hack then we no longer get a problem on the 
reindex.  However, YiWen reported that at the end of an install check out index 
checking reported corruption in the system catalogues.  So we are still looking.

We need to understand why these TXID equal checks exist.  Can we differentiate 
the cases they are protecting against with the two exceptions I’ve found?

FYI, someone should look at the same ”if”  test in heapam.c: 
heap_lock_updated_tuple_rec().  Also, I hope there are no strange issues with 
concurrent index builds.

Finally, the idea behind the original fix was to simply NOT to do an 
unsupported freeze on a dead tuple.  It had two drawbacks:
1) CLOG truncation.  This could have been handled by keeping track of the old 
unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress.   The only reason the prune would fail should be 
because of an open TXN.  Unless that TXN was so old such that it’s XID was as 
old as the ?min freeze threshold? then we would make progress.  If we were 
doing TXN’s that old then we’d be having problems anyway.


On 10/3/17, 5:15 PM, "Michael Paquier"  wrote:

On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan  wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
>  lp | lp_flags | t_xmin  | t_xmax | t_ctid | infomask | infomask2
> +--+-+++--+---
>   1 |1 | 1845995 |  0 | (0,1)  | b02  | 3
>   2 |2 | |||  |
>   3 |0 | |||  |
>   4 |