Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-08-20 Thread Heikki Linnakangas
On 07/20/2014 07:17 PM, Tomas Vondra wrote: On 19.7.2014 20:24, Tomas Vondra wrote: On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-08-20 Thread Tomas Vondra
On 20 Srpen 2014, 14:05, Heikki Linnakangas wrote: On 07/20/2014 07:17 PM, Tomas Vondra wrote: On 19.7.2014 20:24, Tomas Vondra wrote: On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that,

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-20 Thread Tomas Vondra
On 20.7.2014 00:12, Tomas Vondra wrote: On 19.7.2014 23:07, Tomas Vondra wrote: On 19.7.2014 20:28, Tomas Vondra wrote: For the first case, a WARNING at the end of estimate_hash_bucketsize says this: WARNING: nbuckets=8388608.00 estfract=0.01 WARNING: nbuckets=65536.00

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-20 Thread Tomas Vondra
On 19.7.2014 20:24, Tomas Vondra wrote: On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew bucket (so we can remove it without messing

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tomas Vondra
On 14.7.2014 06:29, Stephen Frost wrote: Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: On 6.7.2014 17:57, Stephen Frost wrote: * Tomas Vondra (t...@fuzzy.cz) wrote: I can't find the thread / test cases in the archives. I've found this thread in hackers:

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes: I've reviewed the two test cases mentioned here, and sadly there's nothing that can be 'fixed' by this patch. The problem here lies in the planning stage, which decides to hash the large table - we can't fix that in the executor. We've heard a couple reports

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tomas Vondra
On 13.7.2014 21:32, Tomas Vondra wrote: The current patch only implemnents this for tuples in the main hash table, not for skew buckets. I plan to do that, but it will require separate chunks for each skew bucket (so we can remove it without messing with all of them). The chunks for skew

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tomas Vondra
On 19.7.2014 20:24, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: I've reviewed the two test cases mentioned here, and sadly there's nothing that can be 'fixed' by this patch. The problem here lies in the planning stage, which decides to hash the large table - we can't fix that in the

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tomas Vondra
On 19.7.2014 20:28, Tomas Vondra wrote: On 19.7.2014 20:24, Tom Lane wrote: Tomas Vondra t...@fuzzy.cz writes: I've reviewed the two test cases mentioned here, and sadly there's nothing that can be 'fixed' by this patch. The problem here lies in the planning stage, which decides to hash the

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-19 Thread Tomas Vondra
On 19.7.2014 23:07, Tomas Vondra wrote: On 19.7.2014 20:28, Tomas Vondra wrote: For the first case, a WARNING at the end of estimate_hash_bucketsize says this: WARNING: nbuckets=8388608.00 estfract=0.01 WARNING: nbuckets=65536.00 estfract=0.000267 There are 4.3M rows in the

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-13 Thread Simon Riggs
On 12 July 2014 12:43, Tomas Vondra t...@fuzzy.cz wrote: So lets just this change done and then do more later. There's no way back, sadly. The dense allocation turned into a challenge. I like challenges. I have to solve it or I won't be able to sleep. I admire your tenacity, but how about

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-13 Thread Tomas Vondra
On 13.7.2014 12:27, Simon Riggs wrote: On 12 July 2014 12:43, Tomas Vondra t...@fuzzy.cz wrote: So lets just this change done and then do more later. There's no way back, sadly. The dense allocation turned into a challenge. I like challenges. I have to solve it or I won't be able to

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-13 Thread Tomas Vondra
On 11.7.2014 19:25, Tomas Vondra wrote: 2) walking through the tuples sequentially -- The other option is not to walk the tuples through buckets, but by walking throught the chunks - we know the tuples are stored as HashJoinTuple/MinimalTuple, so it

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-13 Thread Stephen Frost
Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: On 6.7.2014 17:57, Stephen Frost wrote: * Tomas Vondra (t...@fuzzy.cz) wrote: I can't find the thread / test cases in the archives. I've found this thread in hackers:

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-12 Thread Simon Riggs
On 11 July 2014 18:25, Tomas Vondra t...@fuzzy.cz wrote: Turns out getting this working properly will quite complicated. Lets keep this patch simple then. Later research can be another patch. In terms of memory pressure, having larger joins go x4 faster has a much more significant reducing

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-12 Thread Tomas Vondra
On 12.7.2014 11:39, Simon Riggs wrote: On 11 July 2014 18:25, Tomas Vondra t...@fuzzy.cz wrote: Turns out getting this working properly will quite complicated. Lets keep this patch simple then. Later research can be another patch. Well, the dense allocation is independent to the

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Simon Riggs
On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does it save to recalculate the hash bucket at 10 instead? Resizing sounds

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Tomas Vondra
On 11 Červenec 2014, 9:27, Simon Riggs wrote: On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming NTUP_PER_BUCKET = 1, how much memory does it save to

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Simon Riggs
On 11 July 2014 10:23, Tomas Vondra t...@fuzzy.cz wrote: On 11 Červenec 2014, 9:27, Simon Riggs wrote: On 9 July 2014 18:54, Tomas Vondra t...@fuzzy.cz wrote: (1) size the buckets for NTUP_PER_BUCKET=1 (and use whatever number of batches this requires) If we start off by assuming

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-11 Thread Tomas Vondra
On 10.7.2014 21:33, Tomas Vondra wrote: On 9.7.2014 16:07, Robert Haas wrote: On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-10 Thread Tomas Vondra
On 9.7.2014 16:07, Robert Haas wrote: On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the tuples, stuff them into a simple list -

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-09 Thread Robert Haas
On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the tuples, stuff them into a simple list - don't build the hash table yet (2)

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-09 Thread Tomas Vondra
On 9.7.2014 16:07, Robert Haas wrote: On Tue, Jul 8, 2014 at 5:16 PM, Tomas Vondra t...@fuzzy.cz wrote: Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the tuples, stuff them into a simple list - don't

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Robert Haas
On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote: I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1) once the table is built and there's free space in work_mem. The patch mentioned above makes implementing this possible / rather simple. Another idea would be

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Tomas Vondra
On 8 Červenec 2014, 14:49, Robert Haas wrote: On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote: I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1) once the table is built and there's free space in work_mem. The patch mentioned above makes implementing this

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Robert Haas
On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote: On 8 Červenec 2014, 14:49, Robert Haas wrote: On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote: I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1) once the table is built and there's free space

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Tomas Vondra
On 8 Červenec 2014, 16:16, Robert Haas wrote: On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote: Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer relations it may be way cheaper to use higher NTUP_PER_BUCKET values instead of increasing the number of

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Robert Haas
On Tue, Jul 8, 2014 at 12:06 PM, Tomas Vondra t...@fuzzy.cz wrote: On 8 Červenec 2014, 16:16, Robert Haas wrote: On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra t...@fuzzy.cz wrote: Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer relations it may be way cheaper to use

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Tomas Vondra
On 8.7.2014 19:00, Robert Haas wrote: On Tue, Jul 8, 2014 at 12:06 PM, Tomas Vondra t...@fuzzy.cz wrote: On 8 Červenec 2014, 16:16, Robert Haas wrote: Right, I think that's clear. I'm just pointing out that you get to decide: you can either start with a larger NTUP_PER_BUCKET and then reduce

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Jeff Janes
On Tue, Jul 8, 2014 at 6:35 AM, Tomas Vondra t...@fuzzy.cz wrote: On 8 Červenec 2014, 14:49, Robert Haas wrote: On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra t...@fuzzy.cz wrote: I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1) once the table is built and there's free

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Tomas Vondra
On 8.7.2014 21:53, Jeff Janes wrote: On Tue, Jul 8, 2014 at 6:35 AM, Tomas Vondra t...@fuzzy.cz wrote: Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer relations it may be way cheaper to use higher NTUP_PER_BUCKET values instead of increasing the number of batches

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-08 Thread Tomas Vondra
Hi, Thinking about this a bit more, do we really need to build the hash table on the first pass? Why not to do this: (1) batching - read the tuples, stuff them into a simple list - don't build the hash table yet (2) building the hash table - we have all the tuples in a simple list,

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-06 Thread Tomas Vondra
On 6.7.2014 06:47, Stephen Frost wrote: * Greg Stark (st...@mit.edu) wrote: Last time was we wanted to use bloom filters in hash joins to filter out tuples that won't match any of the future hash batches to reduce the amount of tuples that need to be spilled to disk. However the problem was

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-06 Thread Stephen Frost
Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: I can't find the thread / test cases in the archives. I've found this thread in hackers: http://www.postgresql.org/message-id/caoezvif-r-ilf966weipk5by-khzvloqpwqurpak3p5fyw-...@mail.gmail.com Can you point me to the right one, please? This:

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-06 Thread Tomas Vondra
On 6.7.2014 17:57, Stephen Frost wrote: Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: I can't find the thread / test cases in the archives. I've found this thread in hackers: http://www.postgresql.org/message-id/caoezvif-r-ilf966weipk5by-khzvloqpwqurpak3p5fyw-...@mail.gmail.com Can you

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-05 Thread Stephen Frost
* Greg Stark (st...@mit.edu) wrote: On Thu, Jul 3, 2014 at 11:40 AM, Atri Sharma atri.j...@gmail.com wrote: IIRC, last time when we tried doing bloom filters, I was short of some real world useful hash functions that we could use for building the bloom filter. Last time was we wanted to

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Tomas Vondra
On 3.7.2014 02:13, Tomas Vondra wrote: Hi, while hacking on the 'dynamic nbucket' patch, scheduled for the next CF (https://commitfest.postgresql.org/action/patch_view?id=1494) I was repeatedly stumbling over NTUP_PER_BUCKET. I'd like to propose a change in how we handle it. TL;DR;

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Stephen Frost
Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: However it's likely there are queries where this may not be the case, i.e. where rebuilding the hash table is not worth it. Let me know if you can construct such query (I wasn't). Thanks for working on this! I've been thinking on this for a while

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Atri Sharma
On Thu, Jul 3, 2014 at 11:40 PM, Stephen Frost sfr...@snowman.net wrote: Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: However it's likely there are queries where this may not be the case, i.e. where rebuilding the hash table is not worth it. Let me know if you can construct such query (I

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Tomas Vondra
Hi Stephen, On 3.7.2014 20:10, Stephen Frost wrote: Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: However it's likely there are queries where this may not be the case, i.e. where rebuilding the hash table is not worth it. Let me know if you can construct such query (I wasn't). Thanks for

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Greg Stark
On Thu, Jul 3, 2014 at 11:40 AM, Atri Sharma atri.j...@gmail.com wrote: IIRC, last time when we tried doing bloom filters, I was short of some real world useful hash functions that we could use for building the bloom filter. Last time was we wanted to use bloom filters in hash joins to filter

Re: [HACKERS] tweaking NTUP_PER_BUCKET

2014-07-03 Thread Tomas Vondra
On 3.7.2014 20:50, Tomas Vondra wrote: Hi Stephen, On 3.7.2014 20:10, Stephen Frost wrote: Tomas, * Tomas Vondra (t...@fuzzy.cz) wrote: However it's likely there are queries where this may not be the case, i.e. where rebuilding the hash table is not worth it. Let me know if you can