On Mon, Jul 24, 2017 at 10:50 AM, Joshua D. Drake <j...@commandprompt.com> wrote: > Does this suggest that we don't have a cleanup problem but a fragmentation > problem (or both at least for the index)? Having an index that is almost > twice the uncleaned up size isn't that uncommon.
As Tom pointed out up-thread, it's important to distinguish between inherent overhead, and overhead due to garbage that needs to be cleaned-up by vacuum. It's really hard to delineate which is which here, and I'm not going to try to put a number on it. What I will point out is that you can see quite a significant difference between the space utilization of a B-Tree without any dead tuples, just from the order in which tuples are initially inserted. You can get about a 1/3 loss of space by inserting randomly, rather than inserting in sorted order, which is what REINDEX will more or less do for you. That's because random workloads almost entirely get 50:50 page splits, whereas sorted input will always split the rightmost page, and so will always get 90:10 splits. The space in the random case isn't exactly wasted; it's there for the taking, for key values that happen to fit on the page. You effectively require a larger average reserve of free space on pages with the random workload, because the implementation does not and cannot reason that it would be best to concentrate free space in parts of the keyspace where there is most need for it. That having been said, I do think that this workload suffers from index bloat in a way that isn't so easily explained. It does seem to be an issue with VACUUM controlling bloat in the index in particular. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers