Whatever you do make sure to also test 250 clients running lock.sql.  Even with 
the communities fix plus YiWen’s fix I can still get duplicate rows.  What 
works for “in-block” hot chains may not work when spanning blocks.

Once nearly all 250 clients have done their updates and everybody is waiting to 
vacuum which one by one will take a while I usually just “pkill -9 psql”.  
After that I have many of duplicate “id=3” rows.  On top of that I think we 
might have a lock leak.  After the pkill I tried to rerun setup.sql to 
drop/create the table and it hangs.  I see an autovacuum process starting and 
existing every couple of seconds.  Only by killing and restarting PG can I drop 
the table.
On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote:

    On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> 
    > Wong, Yi Wen wrote:
    >> My interpretation of README.HOT is the check is just to ensure the chain 
is continuous; in which case the condition should be:
    >> >                 if (TransactionIdIsValid(priorXmax) &&
    >> >                         !TransactionIdEquals(priorXmax, 
    >> >                         break;
    >> So the difference is GetRawXmin vs GetXmin, because otherwise we get the 
FreezeId instead of the Xmin when the transaction happened
    > I independently arrived at the same conclusion.  Since I was trying with
    > 9.3, the patch differs -- in the old version we must explicitely test
    > for the FrozenTransactionId value, instead of using GetRawXmin.
    > Attached is the patch I'm using, and my own oneliner test (pretty much
    > the same I posted earlier) seems to survive dozens of iterations without
    > showing any problem in REINDEX.
    Confirmed, the problem goes away with this patch on 9.3.
    > This patch is incomplete, since I think there are other places that need
    > to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
    > Of course, for 9.4 and onwards we need to patch like you described.
    I have just done a lookup of the source code, and here is an
    exhaustive list of things in need of surgery:
    - heap_hot_search_buffer
    - heap_get_latest_tid
    - heap_lock_updated_tuple_rec
    - heap_prune_chain
    - heap_get_root_tuples
    - rewrite_heap_tuple
    - EvalPlanQualFetch (twice)
    > This bit in EvalPlanQualFetch caught my attention ... why is it saying
    > xmin never changes?  It does change with freezing.
    >                         /*
    >                          * If xmin isn't what we're expecting, the slot 
must have been
    >                          * recycled and reused for an unrelated tuple.  
This implies that
    >                          * the latest version of the row was deleted, so 
we need do
    >                          * nothing.  (Should be safe to examine xmin 
without getting
    >                          * buffer's content lock, since xmin never 
changes in an existing
    >                          * tuple.)
    >                          */
    >                         if 
    Agreed. That's not good.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to