Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Pavan Deolasee Thu, 08 Sep 2016 07:56:00 -0700

On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freire <klaussfre...@gmail.com>
wrote:

> On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark <st...@mit.edu> wrote:
> > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs <si...@2ndquadrant.com>
> wrote:
> >> On 6 September 2016 at 19:59, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >>
> >>> The idea of looking to the stats to *guess* about how many tuples are
> >>> removable doesn't seem bad at all.  But imagining that that's going to
> be
> >>> exact is folly of the first magnitude.
> >>
> >> Yes.  Bear in mind I had already referred to allowing +10% to be safe,
> >> so I think we agree that a reasonably accurate, yet imprecise
> >> calculation is possible in most cases.
> >
> > That would all be well and good if it weren't trivial to do what
> > Robert suggested. This is just a large unsorted list that we need to
> > iterate throught. Just allocate chunks of a few megabytes and when
> > it's full allocate a new chunk and keep going. There's no need to get
> > tricky with estimates and resizing and whatever.
>
> I agree. While the idea of estimating the right size sounds promising
> a priori, considering the estimate can go wrong and over or
> underallocate quite severely, the risks outweigh the benefits when you
> consider the alternative of a dynamic allocation strategy.
>
> Unless the dynamic strategy has a bigger CPU impact than expected, I
> believe it's a superior approach.
>
>
How about a completely different representation for the TID array? Now this
is probably not something new, but I couldn't find if the exact same idea
was discussed before. I also think it's somewhat orthogonal to what we are
trying to do here, and will probably be a bigger change. But I thought I'll
mention since we are at the topic.

What I propose is to use a simple bitmap to represent the tuples. If a
tuple at <block, offset> is dead then the corresponding bit in the bitmap
is set. So clearly the searches through dead tuples are O(1) operations,
important for very large tables and large arrays.

Challenge really is that a heap page can theoretically have MaxOffsetNumber
of line pointers (or to be precise maximum possible offset number). For a
8K block, that comes be about 2048. Having so many bits per page is neither
practical nor optimal. But in practice the largest offset on a heap page
should not be significantly greater than MaxHeapTuplesPerPage, which is a
more reasonable value of 291 on my machine. Again, that's with zero sized
tuple and for real life large tables, with much wider tuples, the number
may go down even further.

So we cap the offsets represented in the bitmap to some realistic value,
computed by looking at page density and then multiplying it by a small
factor (not more than two) to take into account LP_DEAD and LP_REDIRECT
line pointers. That should practically represent majority of the dead
tuples in the table, but we then keep an overflow area to record tuples
beyond the limit set for per page. The search routine will do a direct
lookup for offsets less than the limit and search in the sorted overflow
area for offsets beyond the limit.

For example, for a table with 60 bytes wide tuple (including 24 byte
header), each page can approximately have 8192/60 = 136 tuples. Say we
provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the
bitmap. First 272 offsets in every page are represented in the bitmap and
anything greater than are in overflow region. On the other hand, the
current representation will need about 16 bytes per page assuming 2% dead
tuples, 40 bytes per page assuming 5% dead tuples and 80 bytes assuming 10%
dead tuples. So bitmap will take more space for small tuples or when vacuum
is run very aggressively, both seems unlikely for very large tables. Of
course the calculation does not take into account the space needed by the
overflow area, but I expect that too be small.

I guess we can make a choice between two representations at the start
looking at various table stats. We can also be smart and change from bitmap
to traditional representation as we scan the table and see many more tuples
in the overflow region than we provisioned for. There will be some
challenges in converting representation mid-way, especially in terms of
memory allocation, but I think those can be sorted out if we think that the
idea has merit.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Reply via email to