On 01/03/2023 12:21, Aleksander Alekseev wrote:
Hi,

I'm surprised that these patches extend the page numbering to 64 bits,
but never actually uses the high bits. The XID "epoch" is not used, and
pg_xact still wraps around and the segment names are still reused. I
thought we could stop doing that.

To clarify, the idea is to let CLOG grow indefinitely and simply store
FullTransactionId -> TransactionStatus (two bits). Correct?

Correct.

I didn't investigate this in much detail but it may affect quite some
amount of code since TransactionIdDidCommit() and
TransactionIdDidCommit() currently both deal with TransactionId, not
FullTransactionId. IMO, this would be a nice change however, assuming
we are ready for it.

Yep, it's a lot of code churn..

In the previous version of the patch there was an attempt to derive
FullTransactionId from TransactionId but it was wrong for the reasons
named above in the thread. Thus is was removed and the patch
simplified.

Yeah, it's tricky to get it right. Clearly we need to do it at some point though.

All in all, this is a big effort. I spent some more time reviewing this in the last few days, and thought a lot about what the path forward here could be. And I haven't looked at the actual 64-bit XIDs patch set yet, just this patch to use 64-bit addressing in SLRUs.

This patch is the first step, but we have a bit of a chicken and egg problem, because this patch on its own isn't very interesting, but on the other hand, we need it to work on the follow up items. Here's how I see the development path for this (and again, this is just for the 64-bit SLRUs work, not the bigger 64-bit-XIDs-in-heapam effort):

1. Use 64 bit page numbers in SLRUs (this patch)

I would like to make one change here: I'd prefer to keep the old 4-digit segment names, until we actually start to use the wider address space. Let's allow each SLRU to specify how many digits to use in the filenames, so that we convert one SLRU at a time.

If we do that, and don't change any of the existing SLRUs to actually use the wider space of page and segment numbers yet, this patch becomes just refactoring with no on-disk format changes. No pg_upgrade needed.

The next patches will start to make use of the wider address space, one SLRU at a time.

2. Use the larger segment file names in async.c, to lift the current 8 GB limit on the max number of pending notifications.

No one actually minds the limit, it's quite generous as it is. But there is some code and complexity in async.c to avoid the wraparound that could be made simpler if we used longer SLRU segment names and avoided the wraparound altogether.

I wonder if we should actually add an artificial limit, as a GUC. If there are gigabytes of notifications queued up, something's probably wrong with the system, and you're not going to be happy if we just remove the limit so it can grow to terabytes until you run out of disk space.

3. Extend pg_multixact so that pg_multixact/members is addressed by 64-bit offsets.

Currently, multi-XIDs can wrap around, requiring anti-wraparound freezing, but independently of that, the pg_multixact/members SLRU can also wrap around. We track both, and trigger anti-wraparound if either SLRU is about to wrap around. If we redefine MultiXactOffset as a 64-bit integer, we can avoid the pg_multixact/members wraparound altogether. A downside is that pg_multixact/offsets will take twice as much space, but I think that's a good tradeoff. Or perhaps we can play tricks like store a single 64-bit offset on each pg_multixact/offsets page, and a 32-bit offset from that for each XID, to avoid making it so much larger.

This would reduce the need to do anti-wraparound VACUUMs on systems that use multixacts heavily. Needs pg_upgrade support.

4. Extend pg_subtrans to 64-bits.

This isn't all that interesting because the active region of pg_subtrans cannot be wider than 32 bits anyway, because you'll still reach the general 32-bit XID wraparound. But it might be less confusing in some places.

I actually started to write a patch to do this, to see how complicated it is. It quickly proliferates into expanding other XIDs to 64-bits, like TransactionXmin, frozenXid calculation in vacuum.c, known-assigned XID tracking in procarray.c. etc. It's going to be necessary to convert 32-bit XIDs to FullTransactionIds at some boundaries, and I'm not sure where exactly that should happen. It's easier to do the conversions close to subtrans.c, but then I'm not sure how much it gets us in terms of reducing confusion. It's easy to get confused with the epochs during conversions, as you noted. On the other hand, if we change much more of the backend to use FullTransactionIds, the patch becomes much more invasive.

Nice thing with pg_subtrans, though, is that it doesn't require pg_upgrade support.

5. Extend pg_xact to 64-bits.

Similar to pg_subtrans, really, but needs pg_upgrade support.

6. (a bonus thing that I noticed while thinking of pg_xact.) Extend pg_twophase.c, to use FullTransactionIds.

Currently, the twophase state files in pg_twophase are named according to the 32 bit Xid of the transaction. Let's switch to FullTransactionId there.



As we start to refactor these things, I also think it would be good to have more explicit tracking of the valid range of SLRU pages in each SLRU. Take pg_subtrans for example: it's not very clear what pages have been initialized, especially during different stages of startup. It would be good to have clear start and end page numbers, and throw an error if you try to look up anything outside those bounds. Same for all other SLRUs.

I propose that we try to finish 1 and 2 for v16. And maybe 6. I think that's doable. It doesn't have any great user-visible benefits yet, but we need to start somewhere.

- Heikki



Reply via email to