Hi, On 2/19/23 02:03, Matthias van de Meent wrote: > On Thu, 16 Jun 2022 at 15:05, Tomas Vondra > <tomas.von...@enterprisedb.com> wrote: >> >> I've pushed the revert. Let's try again for PG16. > > As we discussed in person at the developer meeting, here's a patch to > try again for PG16. > > It combines the committed patches with my fix, and adds some > additional comments and polish. I am confident the code is correct, > but not that it is clean (see the commit message of the patch for > details). >
Thanks for the patch. I took a quick look, and I agree it seems correct, and fairly clean too. Which places you think need cleanup/improvement? AFAICS some of the code comes from the original (reverted) patch, so that should be fairly non-controversial. The two new bits seem to be TU_UpdateIndexes and HEAP_TUPLE_SUMMARIZING_UPDATED. I have some minor review comments regarding TU_UpdateIndexes, but in principle it's fine - we need to track/pass the flag somehow, and this is reasonable IMHO. I'm not entirely sure about HEAP_TUPLE_SUMMARIZING_UPDATED yet. It's pretty much a counter-part to TU_UpdateIndexes - until now we've only had HOT vs. non-HOT, and one bit in header (HEAP_HOT_UPDATED) was sufficient for that. But now we need 3 states, so an extra bit is needed. That's fine, and using another bit in the header makes sense. The commit message says the bit is "otherwise unused" but after a while I realized it's just an "alias" for HEAP_HOT_UPDATED - I guess it means it's unused in the places that need to track set it, right? I wonder if something can be confused by this - thinking it's a regular HOT update, and doing something wrong. Do we have some precedent for using a header bit like this? Something that'd set a bit on in-memory tuple only to reset it shortly after? Does it make sense to add asserts that'd ensure we can't set the bit twice? Like a code setting both HEAP_HOT_UPDATED and the new flag? A couple more minor comments after eye-balling the patch: * I think heap_update would benefit from a couple more comments, e.g. the comment before calculating sum_attrs should probably mention the summarization optimization. * heapam_tuple_update is probably the one place that I find hard to read not quite readable. * I don't understand why the TU_UpdateIndexes fields are prefixed TUUI_? Why not to just use TU_? * indexam.sgml says: Access methods that do not point to individual tuples, but to (like I guess "page range" (or something like that) is missing. Note: I wonder how difficult would it be to also deal with attributes in predicates. IIRC if the predicate is false, we can ignore the index, but the consensus back then was it's too expensive as it can't be done using the bitmaps and requires evaluating the expression, etc. But maybe there are ways to work around that by first checking everything except for the index predicates, and only when we still think HOT is possible we would check the predicates. Tables usually have only a couple partial indexes, so this might be a win. Not something this patch should/needs to do, of course. * bikeshedding: rel.h says Bitmapset *rd_summarizedattr; /* cols indexed by block-or-larger summarizing indexes */ I think the "block-or-larger" bit is unnecessary. I think the crucial bit is the index does not contain pointers to individual tuples. Similarly for indexam.sgml, which talks about "at least all tuples in one block". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company