On Sat, Aug 9, 2014 at 9:13 AM, Tomas Vondra <t...@fuzzy.cz> wrote: > Adding least-significant bit does not work, we need get back to adding > the most-significant one. Not sure what's the least complex way to do > that, though. > > I'm thinking about computing the nbuckets limit (how many buckets may > fit into work_mem): > > tupsize = HJTUPLE_OVERHEAD + > MAXALIGN(sizeof(MinimalTupleData)) + > MAXALIGN(tupwidth); > > nbuckets_bits_max = my_log2(work_mem / tupsize) > > and then start with nbatches from this bit, like this: > > 31 23 max 15 7 0 > |----------------|----------------|----------------|----------------| > | | <- batches | free | <- buckets |
That doesn't seem unreasonable provide there's enough bit space, but, as you point out, the day might not be far off when there isn't. It also strikes me that when there's only 1 batch, the set of bits that map onto the batch number is zero-width, and one zero-width bit range is as good as another. In other words, if you're only planning to do one batch, you can easily grow the number of buckets on the fly. Growing the number of buckets only becomes difficult once you have more than one batch. And I mention that because, in the scenario mentioned in your original post ("a hash table with small number of buckets (8192) containing large number of tuples [~3.3M]"), presumably what happens is you start out thinking you are going to have one batch with 8K buckets. Then, as more tuples continue to pour in, you see the load factor rising and responding by bumping up the size of the hash table. Now eventually you realize, gosh, this isn't even going to fit into work_mem, so you need to start batching it. But by that point you've presumably done as much as you're going to do in terms of increasing the number of buckets; after that, you're just going to add more batches as necessary. So not being able to further increase the number of buckets once the number of batches is no longer 1 wouldn't be a problem in that case. Now if you start out planning for multiple batches, and then you discover you'd like to keep the same number of batches but have more buckets per batch, that's gonna be hard. It's not impossible, because you could steal some of the unused high-order bits above the number of batches, and then concatenate them with the low-order bits that you already had, but that seems like kind of an ugly kludge, and AFAICS it only helps if the new tuples are narrower by enough to justify the cost of splitting all the buckets. I won't say that couldn't happen, but I think it would be better to keep that complexity out of the patch unless we're absolutely certain it's justified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers