On Sep 14, 2016 5:18 PM, "Robert Haas" <robertmh...@gmail.com> wrote: > > On Wed, Sep 14, 2016 at 8:16 AM, Pavan Deolasee > <pavan.deola...@gmail.com> wrote: > > Ah, thanks. So MaxHeapTuplesPerPage sets the upper boundary for the per page > > bitmap size. Thats about 36 bytes for 8K page. IOW if on an average there > > are 6 or more dead tuples per page, bitmap will outperform the current > > representation, assuming max allocation for bitmap. If we can use additional > > estimates to restrict the size to somewhat more conservative value and then > > keep overflow area, then probably the break-even happens even earlier than > > that. I hope this gives us a good starting point, but let me know if you > > think it's still a wrong approach to pursue. > > Well, it's certainly a bigger change. I think the big concern is that > the amount of memory now becomes fixed based on the table size. So > one problem is that you have to figure out what you're going to do if > the bitmap doesn't fit in maintenance_work_mem. A related problem is > that it might fit but use more memory than before, which could cause > problems for some people. Now on the other hand it could also use > less memory for some people, and that would be good. > > I am kind of doubtful about this whole line of investigation because > we're basically trying pretty hard to fix something that I'm not sure > is broken. I do agree that, all other things being equal, the TID > lookups will probably be faster with a bitmap than with a binary > search, but maybe not if the table is large and the number of dead > TIDs is small, because cache efficiency is pretty important. But even > if it's always faster, does TID lookup speed even really matter to > overall VACUUM performance? Claudio's early results suggest that it > might, but maybe that's just a question of some optimization that > hasn't been done yet. > > I'm fairly sure that our number one priority should be to minimize the > number of cases where we need to do multiple scans of the indexes to > stay within maintenance_work_mem. If we're satisfied we've met that > goal, then within that we should try to make VACUUM as fast as > possible with as little memory usage as possible. I'm not 100% sure I > know how to get there, or how much work it's worth expending. In > theory we could even start with the list of TIDs and switch to the > bitmap if the TID list becomes larger than the bitmap would have been, > but I don't know if it's worth the effort. > > /me thinks a bit. > > Actually, I think that probably *is* worthwhile, specifically because > it might let us avoid multiple index scans in cases where we currently > require them. Right now, our default maintenance_work_mem value is > 64MB, which is enough to hold a little over ten million tuples. It's > also large enough to hold a bitmap for a 14GB table. So right now if > you deleted, say, 100 tuples per page you would end up with an index > vacuum cycles for every ~100,000 pages = 800MB, whereas switching to > the bitmap representation for such cases would require only one index > vacuum cycle for every 14GB, more than an order of magnitude > improvement! > > On the other hand, if we switch to the bitmap as the ONLY possible > representation, we will lose badly when there are scattered updates - > e.g. 1 deleted tuple every 10 pages. So it seems like we probably > want to have both options. One tricky part is figuring out how we > switch between them when memory gets tight; we have to avoid bursting > above our memory limit while making the switch. And even if our > memory limit is very high, we want to avoid using memory gratuitously; > I think we should try to grow memory usage incrementally with either > representation. > > For instance, one idea to grow memory usage incrementally would be to > store dead tuple information separately for each 1GB segment of the > relation. So we have an array of dead-tuple-representation objects, > one for every 1GB of the relation. If there are no dead tuples in a > given 1GB segment, then this pointer can just be NULL. Otherwise, it > can point to either the bitmap representation (which will take ~4.5MB) > or it can point to an array of TIDs (which will take 6 bytes/TID). > That could handle an awfully wide variety of usage patterns > efficiently; it's basically never worse than what we're doing today, > and when the dead tuple density is high for any portion of the > relation it's a lot better. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (firstname.lastname@example.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
I'd say it's an idea worth pursuing. It's the base idea behind roaring bitmaps, arguably the best overall compressed bitmap implementation.