On 24/03/2024 18:32, Melanie Plageman wrote:
On Thu, Mar 21, 2024 at 9:28 AM Heikki Linnakangas <hlinn...@iki.fi> wrote:

In heap_page_prune_and_freeze(), we now do some extra work on each live
tuple, to set the all_visible_except_removable correctly. And also to
update live_tuples, recently_dead_tuples and hastup. When we're not
freezing, that's a waste of cycles, the caller doesn't care. I hope it's
enough that it doesn't matter, but is it?

Last year on an early version of the patch set I did some pgbench
tpcb-like benchmarks -- since there is a lot of on-access pruning in
that workload -- and I don't remember it being a showstopper. The code
has changed a fair bit since then. However, I think it might be safer
to pass a flag "by_vacuum" to heap_page_prune_and_freeze() and skip
the rest of the loop after heap_prune_satisifies_vacuum() when
on-access pruning invokes it. I had avoided that because it felt ugly
and error-prone, however it addresses a few other of your points as
well.

Ok. I'm not a fan of the name 'by_vacuum' though. It'd be nice if the argument described what it does, rather than who it's for. For example, 'need_all_visible'. If set to true, the function determines 'all_visible', otherwise it does not.

I started to look closer at the loops in heap_prune_chain() and how they update all the various flags and counters. There's a lot going on there. We have:

- live_tuples counter
- recently_dead_tuples counter
- all_visible[_except_removable]
- all_frozen
- visibility_cutoff_xid
- hastup
- prstate.frozen array
- nnewlpdead
- deadoffsets array

And that doesn't even include all the local variables and the final dead/redirected arrays.

Some of those are set in the first loop that initializes 'htsv' for each tuple on the page. Others are updated in heap_prune_chain(). Some are updated in both. It's hard to follow which are set where.

I think recently_dead_tuples is updated incorrectly, for tuples that are part of a completely dead HOT chain. For example, imagine a hot chain with two tuples: RECENTLY_DEAD -> DEAD. heap_prune_chain() would follow the chain, see the DEAD tuple at the end of the chain, and mark both tuples for pruning. However, we already updated 'recently_dead_tuples' in the first loop, which is wrong if we remove the tuple.

Maybe that's the only bug like this, but I'm a little scared. Is there something we could do to make this simpler? Maybe move all the new work that we added to the first loop, into heap_prune_chain() ? Maybe introduce a few more helper heap_prune_record_*() functions, to update the flags and counters also for live and insert/delete-in-progress tuples and for dead line pointers? Something like heap_prune_record_live() and heap_prune_record_lp_dead().

The 'frz_conflict_horizon' stuff is still fuzzy to me. (Not necessarily
these patches's fault). This at least is wrong, because Max(a, b)
doesn't handle XID wraparound correctly:

                       if (do_freeze)
                               conflict_xid = 
Max(prstate.snapshotConflictHorizon,
                                                                  
presult->frz_conflict_horizon);
                       else
                               conflict_xid = prstate.snapshotConflictHorizon;

Then there's this in lazy_scan_prune():

               /* Using same cutoff when setting VM is now unnecessary */
               if (presult.all_frozen)
                       presult.frz_conflict_horizon = InvalidTransactionId;
This does the right thing in the end, but if all the tuples are frozen
shouldn't frz_conflict_horizon already be InvalidTransactionId? The
comment says it's "newest xmin on the page", and if everything was
frozen, all xmins are FrozenTransactionId. In other words, that should
be moved to heap_page_prune_and_freeze() so that it doesn't lie to its
caller. Also, frz_conflict_horizon is only set correctly if
'all_frozen==true', would be good to mention that in the comments too.

Yes, this is a good point. I've spent some time swapping all of this
back into my head. I think we should change the names of all these
conflict horizon variables and introduce some local variables again.
In the attached patch, I've updated the name of the variable in
PruneFreezeResult to vm_conflict_horizon, as it is only used for
emitting a VM update record. Now, I don't set it until the end of
heap_page_prune_and_freeze(). It is only updated from
InvalidTransactionId if the page is not all frozen. As you say, if the
page is all frozen, there can be no conflict.

Makes sense.

I've also changed PruneState->snapshotConflictHorizon to
PruneState->latest_xid_removed.

And I introduced the local variables visibility_cutoff_xid and
frz_conflict_horizon. I think it is important we distinguish between
the latest xid pruned, the latest xmin of tuples frozen, and the
latest xid of all live tuples on the page.

Though we end up using visibility_cutoff_xid as the freeze conflict
horizon if the page is all frozen, I think it is more clear to have
the three variables and name them what they are.

Agreed.

Note that I still don't think we have a resolution on what to
correctly update new_relfrozenxid and new_relminmxid to at the end
when presult->nfrozen == 0 and presult->all_frozen is true.

         if (presult->nfrozen > 0)
         {
             presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
             presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
         }
         else
         {
             presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
             presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
         }

One approach is to take them out of the PageFreezeResult struct again, and pass them as pointers:

void
heap_page_prune_and_freeze(Relation relation, Buffer buffer,
        ...
        TransactionId *new_relfrozenxid,
        MultiXactId *new_relminmxid,
        ...
)

That would be natural for the caller too, as it wouldn't need to set up the old values to HeapPageFreeze before each call, nor copy the new values to 'vacrel' after the call. I'm thinking that we'd move the responsibility of setting up HeapPageFreeze to heap_page_prune_and_freeze(), instead of having the caller set it up. So the caller would look something like this:

        heap_page_prune_and_freeze(rel, buf, vacrel->vistest,
           &vacrel->cutoffs, &vacrel->NewRelfrozenXid, &vacrel->NewRelminMxid,
           &presult,
           PRUNE_VACUUM_SCAN, flags,
           &vacrel->offnum);

In this setup, heap_page_prune_and_freeze() would update *new_relfrozenxid and *new_relminmxid when it has a new value for them, and leave them unchanged otherwise.

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to