On Thu, Sep 8, 2016 at 11:54 PM, Pavan Deolasee
<pavan.deola...@gmail.com> wrote:
> On Wed, Sep 7, 2016 at 10:18 PM, Claudio Freire <klaussfre...@gmail.com>
> wrote:
>> On Wed, Sep 7, 2016 at 12:12 PM, Greg Stark <st...@mit.edu> wrote:
>> > On Wed, Sep 7, 2016 at 1:45 PM, Simon Riggs <si...@2ndquadrant.com>
>> > wrote:
>> >> On 6 September 2016 at 19:59, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> >>
>> >>> The idea of looking to the stats to *guess* about how many tuples are
>> >>> removable doesn't seem bad at all.  But imagining that that's going to
>> >>> be
>> >>> exact is folly of the first magnitude.
>> >>
>> >> Yes.  Bear in mind I had already referred to allowing +10% to be safe,
>> >> so I think we agree that a reasonably accurate, yet imprecise
>> >> calculation is possible in most cases.
>> >
>> > That would all be well and good if it weren't trivial to do what
>> > Robert suggested. This is just a large unsorted list that we need to
>> > iterate throught. Just allocate chunks of a few megabytes and when
>> > it's full allocate a new chunk and keep going. There's no need to get
>> > tricky with estimates and resizing and whatever.
>> I agree. While the idea of estimating the right size sounds promising
>> a priori, considering the estimate can go wrong and over or
>> underallocate quite severely, the risks outweigh the benefits when you
>> consider the alternative of a dynamic allocation strategy.
>> Unless the dynamic strategy has a bigger CPU impact than expected, I
>> believe it's a superior approach.
> How about a completely different representation for the TID array? Now this
> is probably not something new, but I couldn't find if the exact same idea
> was discussed before. I also think it's somewhat orthogonal to what we are
> trying to do here, and will probably be a bigger change. But I thought I'll
> mention since we are at the topic.
> What I propose is to use a simple bitmap to represent the tuples. If a tuple
> at <block, offset> is dead then the corresponding bit in the bitmap is set.
> So clearly the searches through dead tuples are O(1) operations, important
> for very large tables and large arrays.
> Challenge really is that a heap page can theoretically have MaxOffsetNumber
> of line pointers (or to be precise maximum possible offset number). For a 8K
> block, that comes be about 2048. Having so many bits per page is neither
> practical nor optimal. But in practice the largest offset on a heap page
> should not be significantly greater than MaxHeapTuplesPerPage, which is a
> more reasonable value of 291 on my machine. Again, that's with zero sized
> tuple and for real life large tables, with much wider tuples, the number may
> go down even further.
> So we cap the offsets represented in the bitmap to some realistic value,
> computed by looking at page density and then multiplying it by a small
> factor (not more than two) to take into account LP_DEAD and LP_REDIRECT line
> pointers. That should practically represent majority of the dead tuples in
> the table, but we then keep an overflow area to record tuples beyond the
> limit set for per page. The search routine will do a direct lookup for
> offsets less than the limit and search in the sorted overflow area for
> offsets beyond the limit.
> For example, for a table with 60 bytes wide tuple (including 24 byte
> header), each page can approximately have 8192/60 = 136 tuples. Say we
> provision for 136*2 = 272 bits per page i.e. 34 bytes per page for the
> bitmap. First 272 offsets in every page are represented in the bitmap and
> anything greater than are in overflow region. On the other hand, the current
> representation will need about 16 bytes per page assuming 2% dead tuples, 40
> bytes per page assuming 5% dead tuples and 80 bytes assuming 10% dead
> tuples. So bitmap will take more space for small tuples or when vacuum is
> run very aggressively, both seems unlikely for very large tables. Of course
> the calculation does not take into account the space needed by the overflow
> area, but I expect that too be small.
> I guess we can make a choice between two representations at the start
> looking at various table stats. We can also be smart and change from bitmap
> to traditional representation as we scan the table and see many more tuples
> in the overflow region than we provisioned for. There will be some
> challenges in converting representation mid-way, especially in terms of
> memory allocation, but I think those can be sorted out if we think that the
> idea has merit.

Making the vacuum possible to choose between two data representations
sounds good.
I implemented the patch that changes dead tuple representation to bitmap before.
I will measure the performance of bitmap representation again and post them.


Masahiko Sawada
NTT Open Source Software Center

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to