On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > Attached PoC patch changes the representation of dead tuple locations > to the hashmap having tuple bitmap. > The one hashmap entry consists of the block number and the TID bitmap > of corresponding block, and the block number is the hash key of > hashmap. > Current implementation of this patch is not smart yet because each > hashmap entry allocates the tuple bitmap with fixed > size(LAZY_ALLOC_TUPLES), so each hashentry can store up to > LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples. > In case where one block can store only the several tens tuples, the > most bits are would be waste. > > After improved this patch as you suggested, I will measure performance > benefit.
We also need to consider the amount of memory gets used. What I proposed - replacing the array with an array of arrays - would not increase memory utilization significantly. I don't think it would have much impact on CPU utilization either. It would require replacing the call to bsearch() in lazy_heap_reaptid() with an open-coded implementation of bsearch, or with one bsearch to find the chunk and another to find the TID within the chunk, but that shouldn't be very expensive. For one thing, if the array chunks are around the size I proposed (64MB), you've got more than ten million tuples per chunk, so you can't have very many chunks unless your table is both really large and possessed of quite a bit of dead stuff. Now, if I'm reading it correctly, this patch allocates a 132-byte structure for every page with at least one dead tuple. In the worst case where there's just one dead tuple per page, that's a 20x regression in memory usage. Actually, it's more like 40x, because AllocSetAlloc rounds small allocation sizes up to the next-higher power of two, which really stings for a 132-byte allocation, and then adds a 16-byte header to each chunk. But even 20x is clearly not good. There are going to be lots of real-world cases where this uses way more memory to track the same number of dead tuples, and I'm guessing that benchmarking is going to reveal that it's not faster, either. I think it's probably wrong to worry that an array-of-arrays is going to be meaningfully slower than a single array here. It's basically costing you some small number of additional memory references per tuple, which I suspect isn't all that relevant for a bulk operation that does I/O, writes WAL, locks buffers, etc. But if it is relevant, then I think there are other ways to buy that performance back which are likely to be more memory efficient than converting this to use a hash table. For example, we could keep a bitmap with one bit per K pages. If the bit is set, there is at least 1 dead tuple on that page; if clear, there are none. When we see an index tuple, we consult the bitmap to determine whether we need to search the TID list. We select K to be the smallest power of 2 such that the bitmap uses less memory than some threshold, perhaps 64kB. Assuming that updates and deletes to the table have some locality, we should be able to skip a large percentage of the TID searches with a probe into this very compact bitmap. Note that we can set K = 1 for tables up to 4GB in size, and even a 1TB table only needs K = 256. Odds are very good that a 1TB table being vacuumed has many 256-page ranges containing no dead tuples at all ... and if this proves to be false and the dead tuples are scattered uniformly throughout the table, then you should probably be more worried about the fact that you're dumping a bare minimum of 4GB of random I/O on your hapless disk controller than about how efficient the TID search is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers