Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-02-01 Thread Albe Laurenz
Tomas Vondra wrote:
> Yes, that's clearly the culprit here. In both cases we estimate here are
> only ~4000 tuples in the hash, and 9.3 sizes the hash table to have at
> most ~10 tuples per bucket (in a linked list).
> 
> However we actually get ~3M rows, so there will be ~3000 tuples per
> bucket, and that's extremely expensive to walk. The reason why 100MB is
> faster is that it's using 2 batches, thus making the lists "just" ~1500
> tuples long.
> 
> This is pretty much exactly the reason why I reworked hash joins in 9.5.
> I'd bet it's going to be ~20x faster on that version.

Thank you for the explanation!

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-02-01 Thread Albe Laurenz
Tomas Vondra wrote:
> On 01/29/2016 04:17 PM, Albe Laurenz wrote:
>> I have a query that runs *slower* if I increase work_mem.
>>
>> The execution plans are identical in both cases, except that a temp file
>> is used when work_mem is smaller.

>> What could be an explanation for this?
>> Is this known behaviour?
> 
> There is a bunch of possible causes for such behavior, but it's quite
> impossible to say if this is an example of one of them as you have not
> posted the interesting parts of the explain plan. Also, knowing
> PostgreSQL version would be useful.
> 
> I don't think the example you posted is due to exceeding on-CPU cache as
> that's just a few MBs per socket, so the smaller work_mem is
> significantly larger.
> 
> What I'd expect to be the issue here is under-estimate of the hash table
> size, resulting in too few buckets and thus long chains of tuples that
> need to be searched sequentially. Smaller work_mem values usually limit
> the length of those chains in favor of batching.
> 
> Please, post the whole explain plan - especially the info about number
> of buckets/batches and the Hash node details.

Thanks for looking at this.
Sorry, I forgot to mention that this is PostgreSQL 9.3.10.

I didn't post the whole plan since it is awfully long, I'll include hyperlinks
for the whole plan.

work_mem = '100MB' (http://explain.depesz.com/s/7b6a):

->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=4296.986..106087.683 rows=187222 loops=1)
   Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
   Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230
[...]
   ->  Hash  (cost=18044.92..18044.92 rows=4014 width=8) (actual 
time=4206.892..4206.892 rows=3096362 loops=1)
 Buckets: 1024  Batches: 2 (originally 1)  Memory Usage: 102401kB
 Buffers: shared hit=1134522 dirtied=1, temp written=5296

work_mem = '500MB' (http://explain.depesz.com/s/Cgkl):

->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=3802.849..245970.049 rows=187222 loops=1)
   Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
   Buffers: shared hit=1181175 dirtied=111
[...]
   ->  Hash  (cost=18044.92..18044.92 rows=4014 width=8) (actual 
time=3709.584..3709.584 rows=3096360 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 120952kB
 Buffers: shared hit=1134520 dirtied=111

Does that support your theory?

There is clearly an underestimate here, caused by correlated attributes, but
is that the cause for the bad performance with increased work_mem?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-02-01 Thread Tomas Vondra

On 02/01/2016 10:38 AM, Albe Laurenz wrote:

Tomas Vondra wrote:

...

I didn't post the whole plan since it is awfully long, I'll include hyperlinks
for the whole plan.

work_mem = '100MB' (http://explain.depesz.com/s/7b6a):

->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=4296.986..106087.683 rows=187222 loops=1)
Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230
[...]
->  Hash  (cost=18044.92..18044.92 rows=4014 width=8) (actual 
time=4206.892..4206.892 rows=3096362 loops=1)
  Buckets: 1024  Batches: 2 (originally 1)  Memory Usage: 102401kB
  Buffers: shared hit=1134522 dirtied=1, temp written=5296

work_mem = '500MB' (http://explain.depesz.com/s/Cgkl):

->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=3802.849..245970.049 rows=187222 loops=1)
Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
Buffers: shared hit=1181175 dirtied=111
[...]
->  Hash  (cost=18044.92..18044.92 rows=4014 width=8) (actual 
time=3709.584..3709.584 rows=3096360 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 120952kB
  Buffers: shared hit=1134520 dirtied=111

Does that support your theory?

There is clearly an underestimate here, caused by correlated attributes, but
is that the cause for the bad performance with increased work_mem?


Yes, that's clearly the culprit here. In both cases we estimate here are 
only ~4000 tuples in the hash, and 9.3 sizes the hash table to have at 
most ~10 tuples per bucket (in a linked list).


However we actually get ~3M rows, so there will be ~3000 tuples per 
bucket, and that's extremely expensive to walk. The reason why 100MB is 
faster is that it's using 2 batches, thus making the lists "just" ~1500 
tuples long.


This is pretty much exactly the reason why I reworked hash joins in 9.5. 
I'd bet it's going to be ~20x faster on that version.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hash join gets slower as work_mem increases?

2016-01-29 Thread Albe Laurenz
I have a query that runs *slower* if I increase work_mem.

The execution plans are identical in both cases, except that a temp file
is used when work_mem is smaller.

The relevant lines of EXPLAIN ANALYZE output are:

With work_mem='100MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=4296.986..106087.683 rows=187222 loops=1)
  Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
  Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230

With work_mem='500MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=3802.849..245970.049 rows=187222 loops=1)
  Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
  Buffers: shared hit=1181175 dirtied=111

I ran operf on both backends, and they look quite similar, except that the
number of samples is different (this is "opreport -c" output):

CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 9
samples  %image name   symbol name
---
  112   0.0019  postgres ExecProcNode
  3020116  49.9904  postgres ExecScanHashBucket
  3021162  50.0077  postgres ExecHashJoin
3020116  92.8440  postgres ExecScanHashBucket
  3020116  49.9207  postgres ExecScanHashBucket [self]
  3020116  49.9207  postgres ExecScanHashBucket
  8190  0.1354  vmlinux  apic_timer_interrupt

What could be an explanation for this?
Is this known behaviour?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-01-29 Thread Pavel Stehule
Hi



> I ran operf on both backends, and they look quite similar, except that the
> number of samples is different (this is "opreport -c" output):
>
> CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 9
> samples  %image name   symbol name
>
> ---
>   112   0.0019  postgres ExecProcNode
>   3020116  49.9904  postgres ExecScanHashBucket
>   3021162  50.0077  postgres ExecHashJoin
> 3020116  92.8440  postgres ExecScanHashBucket
>   3020116  49.9207  postgres ExecScanHashBucket [self]
>   3020116  49.9207  postgres ExecScanHashBucket
>   8190  0.1354  vmlinux  apic_timer_interrupt
>
> What could be an explanation for this?
> Is this known behaviour?
>

one issue was fixed in 9.5

large hash table can introduce a lot of outs from L1, L2 caches.

Pavel


>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>