Thanks for the comments! > When this happens, we want the table to be vacuumed regardless of > other scores.
I am not sure this is correct. For example, I don't think this scenario should be prioritized higher than a table that is in failsafe because it's nearing wraparound. what do you think? > However, with just setting scores->mxid = mxid_age (as > in the attached patch), unless I'm missing something, there seems to > be a risk that the table won't get to the top of the priority list > because scores->max gets recalculated even after mxid score is > accounted with max of (xid, mxid). Could you help me understand how > this case is handled? It will be prioritized based on the mxid_age, which should naturally be high at that point, so the priority of this table will be based on its age. > > > I do think we need to mention in the docs also about this caveat > > in scoring, so users of pg_stat_autovacuum_scores are not surprised. > > As member space usage grows between 2 billion and 4 billion, the > > score ramps up gradually, but once members reach 4 billion the effective > > freeze > > max age drops to 0 and the score jumps to mxid_age itself, > > which could be in the hundreds of millions. > > I didn't find commit bd8d9c9bdf adding any documentation. Maybe it's > worth adding some notes on what it means for the customers having > multixact-heavy workloads - especially it eliminates anti-wraparound > freezing because of running out of members space. Perhaps more docs on the improvement should be added, but that seems orthogonal to the issue being discussed here. >> 1/ >> + <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>. However, >> + when multixact member space usage is high (see >> + <xref linkend="vacuum-for-multixact-wraparound"/>), the effective >> + freeze max age is reduced below >> + <xref linkend="guc-autovacuum-multixact-freeze-max-age"/> to help >> + reclaim multixact member disk space, which can result in much higher >> + scores than normal. Furthermore, this component increases greatly >> + once the age surpasses >> + <xref linkend="guc-vacuum-multixact-failsafe-age"/>. The >> + final value for this component can be adjusted via > Isn't the "effective freeze max age" code-level terminology? yes, it is described as "effective" in code, but I also think it makes sense user-facing. It does get the point across, doesn't it? > 2/ > /* > * To calculate the (M)XID age portion of the score, divide the age by its > - * respective *_freeze_max_age parameter. > + * respective *_freeze_max_age parameter. MultiXactMemberFreezeThreshold() > + * can return 0, in which case we effectively use mxid_age as the score. > */ > xid_age = TransactionIdIsNormal(relfrozenxid) ? recentXid - relfrozenxid : 0; > mxid_age = MultiXactIdIsValid(relminmxid) ? recentMulti - relminmxid : 0; > For better readability, can we enhance this comment by saying exactly > when the freeze threshold gets returned as 0 telling the caller that > freezing is urgent on this table? That is already described in MultiXactMemberFreezeThreshold(), right? > 3/ I checked around to see if we have tests for the case where we hit > this case where fraction is >= 1.0 i.e. multixact members are > > 4billion and the closest I found is this 002_multixact_wraparound.pl, > but I don't think it covers this case. Its worth testing this case and > the fix locally. FWIW, this code doesn't have coverage - > https://coverage.postgresql.org/src/backend/access/transam/multixact.c.gcov.html. This looks like a separate discussion as well, but I am not against testing for this. -- Sami Imseih Amazon Web Services (AWS)
