On Wed, Sep 14, 2016 at 12:21 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
> > Attached PoC patch changes the representation of dead tuple locations
> > to the hashmap having tuple bitmap.
> > The one hashmap entry consists of the block number and the TID bitmap
> > of corresponding block, and the block number is the hash key of
> > hashmap.
> > Current implementation of this patch is not smart yet because each
> > hashmap entry allocates the tuple bitmap with fixed
> > size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
> > LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
> > In case where one block can store only the several tens tuples, the
> > most bits are would be waste.
> >
> > After improved this patch as you suggested, I will measure performance
> benefit.
> Now, if I'm reading it correctly, this patch allocates a 132-byte
> structure for every page with at least one dead tuple.  In the worst
> case where there's just one dead tuple per page, that's a 20x
> regression in memory usage.  Actually, it's more like 40x, because
> AllocSetAlloc rounds small allocation sizes up to the next-higher
> power of two, which really stings for a 132-byte allocation, and then
> adds a 16-byte header to each chunk.  But even 20x is clearly not
> good.  There are going to be lots of real-world cases where this uses
> way more memory to track the same number of dead tuples, and I'm
> guessing that benchmarking is going to reveal that it's not faster,
> either.
Sawada-san offered to reimplement the patch based on what I proposed
upthread. In the new scheme of things, we will allocate a fixed size bitmap
of length 2D bits per page where D is average page density of live + dead
tuples. (The rational behind multiplying D by a factor of 2 is to consider
worst case scenario where every tuple also has a LP_DIRECT line
pointer). The value of D in most real world, large tables should not go
much beyond, say 100, assuming 80 bytes wide tuple and 8K blocksize. That
translates to about 25 bytes/page. So all TIDs with offset less than 2D can
be represented by a single bit. We augment this with an overflow to track
tuples which fall outside this limit. I believe this area will be small,
say 10% of the total allocation.

This representation is at least as good the current representation if there
are at least 4-5% dead tuples. I don't think very large tables will be
vacuumed with a scale factor less than that. And assuming 10% dead tuples,
this representation will actually be much more optimal.

The idea can fail when (a) there are very few dead tuples in the table, say
less than 5% or (b) there are large number of tuples falling outside the 2D
limit. While I don't expect any of these to hold for real world, very large
tables,  (a) we can anticipate when the vacuum starts and use current
representation. (b) we can detect at run time and do a one time switch
between representation. You may argue that managing two representations is
clumsy, which I agree, but the code is completely isolated and probably not
more than a few hundred lines.


 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to