Re: [HACKERS] Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 1:13 AM, b8flowerfire b8flowerf...@gmail.com wrote:
 Thanks for the explanation. But i don't think it is very convincible.
 Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array
 and reduce the tuples in one batch. But is that effect significant to the
 performance?
 The utilization of the work_mem, i think, is determined by the ratio of size
 of the pointer and the size of the tuple.
 Let's assume the size of tuple is 28 bytes, which is very reasonable because
 it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of
 MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10
 bytes). And the size of pointer is 4 bytes.

The size of a pointer is 8 bytes on most platforms these days.  On the
flip side, we shouldn't forget that each tuple has a 2-pointer, thus
16-byte, overhead due to the way AllocSetAlloc works, and that before
adding that we will round up to the nearest power of two when
allocating.  So in fact, in your example, each tuple will require 48
bytes on a 64-bit platform, and each pointer will require 8.  So if
I'm calculation correctly, the memory allocation for the pointers
would be about 1.6% of the the total with NTUP_PER_BUCKET = 10 and
about 14.3% of the total with NTUP_PER_BUCKET = 1.

 As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the
 batches number by only about 10%. So it that enough to effect the
 performance? Or maybe i can not do the calculation simply in this way.

The problem case is when you have 1 batch and the increased memory
consumption causes you to switch to 2 batches.  That's expensive.  It
seems clear based on previous testing that *on the average*
NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
increase in the number of batches it will be much worse - particularly
because the only way we ever increase the number of batches is to
double it, which is almost always going to be a huge loss.

 Besides, we have larger main-memory now. If we set the work_mem larger, the
 more batches effect introduced by the smaller NTUP_PER_BUCKET value may be
 reduced, couldn't it?

If work_mem is large enough that we're going to do a single batch
either way, or the same number of batches either way, then we can
reduce NTUP_PER_BUCKET and it should be a clear win.

 I have read about discussion about the NTUP_PER_BUCKET before. It seems that
 if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't
 be much worse. Because every tuple in the chain of a bucket has a hash
 value. Having more tuples in a bucket simply increase some comparisons of
 two integers. So is it the same if we change it smaller, that we could not
 get much better? Is it one of the reasons that we define it as 10?

I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Jeff Janes
On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:



 The problem case is when you have 1 batch and the increased memory
 consumption causes you to switch to 2 batches.  That's expensive.  It
 seems clear based on previous testing that *on the average*
 NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
 increase in the number of batches it will be much worse - particularly
 because the only way we ever increase the number of batches is to
 double it, which is almost always going to be a huge loss.


Is there a reason we don't do hybrid hashing, where if 80% fits in memory
than we write out only the 20% that doesn't? And then when probing the
table with the other input, the 80% that land in in-memory buckets get
handled immediately, and only the 20 that land in the on-disk buckets get
written for the next step?

Obviously no one implemented it yet, but is there a fundamental reason for
that or just a round tuit problem?

Cheers,

Jeff


Re: [HACKERS] Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller

2014-06-10 Thread Robert Haas
On Tue, Jun 10, 2014 at 1:43 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas robertmh...@gmail.com wrote:
 The problem case is when you have 1 batch and the increased memory
 consumption causes you to switch to 2 batches.  That's expensive.  It
 seems clear based on previous testing that *on the average*
 NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
 increase in the number of batches it will be much worse - particularly
 because the only way we ever increase the number of batches is to
 double it, which is almost always going to be a huge loss.

 Is there a reason we don't do hybrid hashing, where if 80% fits in memory
 than we write out only the 20% that doesn't? And then when probing the table
 with the other input, the 80% that land in in-memory buckets get handled
 immediately, and only the 20 that land in the on-disk buckets get written
 for the next step?

We have an optimization that is a little bit like that.  The skew
hash join stuff tries to (essentially) ensure that the MCVs are in the
first batch.

But more could probably be done.  For example, suppose we have 256
buckets.  If the hash table overflows work_mem, we could write the
contents of *one bucket* out to disk, rather than (as we currently do)
half of the table.  If we overflow again, we write another bucket.
When the number of buckets written reaches half the total, we split
all of the remaining buckets so that all 256 slots are once again
active.  Repeat as needed.

If something like that worked out, it would drastically reduce the
penalty for slightly overrunning work_mem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers