Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table. On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paqu...@gmail.com> wrote: On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > Wong, Yi Wen wrote: >> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be: >> >> > if (TransactionIdIsValid(priorXmax) && >> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup))) >> > break; >> >> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened > > I independently arrived at the same conclusion. Since I was trying with > 9.3, the patch differs -- in the old version we must explicitely test > for the FrozenTransactionId value, instead of using GetRawXmin. > Attached is the patch I'm using, and my own oneliner test (pretty much > the same I posted earlier) seems to survive dozens of iterations without > showing any problem in REINDEX. Confirmed, the problem goes away with this patch on 9.3. > This patch is incomplete, since I think there are other places that need > to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?). > Of course, for 9.4 and onwards we need to patch like you described. I have just done a lookup of the source code, and here is an exhaustive list of things in need of surgery: - heap_hot_search_buffer - heap_get_latest_tid - heap_lock_updated_tuple_rec - heap_prune_chain - heap_get_root_tuples - rewrite_heap_tuple - EvalPlanQualFetch (twice) > This bit in EvalPlanQualFetch caught my attention ... why is it saying > xmin never changes? It does change with freezing. > > /* > * If xmin isn't what we're expecting, the slot must have been > * recycled and reused for an unrelated tuple. This implies that > * the latest version of the row was deleted, so we need do > * nothing. (Should be safe to examine xmin without getting > * buffer's content lock, since xmin never changes in an existing > * tuple.) > */ > if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data), > priorXmax)) Agreed. That's not good. -- Michael -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers