On Mon, Jul 24, 2017 at 10:50 AM, Joshua D. Drake <j...@commandprompt.com> 
wrote:
> Does this suggest that we don't have a cleanup problem but a fragmentation
> problem (or both at least for the index)? Having an index that is almost
> twice the uncleaned up size isn't that uncommon.

As Tom pointed out up-thread, it's important to distinguish between
inherent overhead, and overhead due to garbage that needs to be
cleaned-up by vacuum. It's really hard to delineate which is which
here, and I'm not going to try to put a number on it. What I will
point out is that you can see quite a significant difference between
the space utilization of a B-Tree without any dead tuples, just from
the order in which tuples are initially inserted.

You can get about a 1/3 loss of space by inserting randomly, rather
than inserting in sorted order, which is what REINDEX will more or
less do for you. That's because random workloads almost entirely get
50:50 page splits, whereas sorted input will always split the
rightmost page, and so will always get 90:10 splits. The space in the
random case isn't exactly wasted; it's there for the taking, for key
values that happen to fit on the page. You effectively require a
larger average reserve of free space on pages with the random
workload, because the implementation does not and cannot reason that
it would be best to concentrate free space in parts of the keyspace
where there is most need for it.

That having been said, I do think that this workload suffers from
index bloat in a way that isn't so easily explained. It does seem to
be an issue with VACUUM controlling bloat in the index in particular.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to