On Mon, Jul 22, 2024 at 6:36 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Melanie Plageman <melanieplage...@gmail.com> writes: > > We've only run tests with this commit on some of the back branches for > > some of these animals. Of those, I don't see any failures so far. So, > > it seems the test instability is just related to trying to get > > multiple passes of index vacuuming reliably with TIDStore. > > > AFAICT, all the 32bit machine failures are timeouts waiting for the > > standby to catch up (mamba, gull, merswine). Unfortunately, the > > failures on copperhead (a 64 bit machine) are because we don't > > actually succeed in triggering a second vacuum pass. This would not be > > fixed by a longer timeout. > > Ouch. This seems to me to raise the importance of getting a better > way to test multiple-index-vacuum-passes. Peter argued upthread > that we don't need a better way, but I don't see how that argument > holds water if copperhead was not reaching it despite being 64-bit. > (Did you figure out exactly why it doesn't reach the code?)
I wasn't able to reproduce the failure (failing to do > 1 index vacuum pass) on my local machine (which is 64 bit) without decreasing the number of tuples inserted. The copperhead failure confuses me because the speed of the machine should *not* affect how much space the dead item TIDStore takes up. I would have bet money that the same number and offsets of dead tuples per page in a relation would take up the same amount of space in a TIDStore on any 64-bit system -- regardless of how slowly it runs vacuum. Here is some background on how I came up with the DDL and tuple count for the test: TIDStore uses 32 BITS_PER_BITMAPWORD on 32 bit systems and 64 on 64 bit systems. So, if you only have one bitmapword's worth of dead items per page, it was easy to figure out that you would need double the number of pages with dead items to take up the same amount of TIDStore space on a 32 bit system than on a 64 bit system. I wanted to figure out how to take up double the amount of TIDStore space *without* doubling the number of tuples. This is not straightforward. You can't just delete twice as many dead tuples per page. For starters, you can compactly represent many dead tuples in a single bitmapword. Outside of this, there seems to be some effect on the amount of space the adaptive radix tree takes up if the dead items on the pages are at the same offsets on all the pages. I thought this might have to do with being able to use the same chunk (in ART terms)? I spent some time trying to figure it out, but I gave up once I got confused enough to try and read the adaptive radix tree paper. I found myself wishing there was some way to visualize the TIDStore. I don't have good ideas how to represent this, but if we found one, we could add a function to the test_tidstore module. I also think it would be useful to have peak TIDStore usage in bytes in the vacuum verbose output. I had it on my list to propose something like this after I hacked together a version myself while trying to debug the test locally. - Melanie