[PERFORM] Hash join gets slower as work_mem increases?

2016-01-29 Thread Albe Laurenz
I have a query that runs *slower* if I increase work_mem.

The execution plans are identical in both cases, except that a temp file
is used when work_mem is smaller.

The relevant lines of EXPLAIN ANALYZE output are:

With work_mem='100MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=4296.986..106087.683 rows=187222 loops=1)
  Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
  Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230

With work_mem='500MB':
->  Hash Join  (cost=46738.74..285400.61 rows=292 width=8) (actual 
time=3802.849..245970.049 rows=187222 loops=1)
  Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id)
  Buffers: shared hit=1181175 dirtied=111

I ran operf on both backends, and they look quite similar, except that the
number of samples is different (this is "opreport -c" output):

CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (No unit mask) count 9
samples  %image name   symbol name
---
  112   0.0019  postgres ExecProcNode
  3020116  49.9904  postgres ExecScanHashBucket
  3021162  50.0077  postgres ExecHashJoin
3020116  92.8440  postgres ExecScanHashBucket
  3020116  49.9207  postgres ExecScanHashBucket [self]
  3020116  49.9207  postgres ExecScanHashBucket
  8190  0.1354  vmlinux  apic_timer_interrupt

What could be an explanation for this?
Is this known behaviour?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join gets slower as work_mem increases?

2016-01-29 Thread Pavel Stehule
Hi



> I ran operf on both backends, and they look quite similar, except that the
> number of samples is different (this is "opreport -c" output):
>
> CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (No unit mask) count 9
> samples  %image name   symbol name
>
> ---
>   112   0.0019  postgres ExecProcNode
>   3020116  49.9904  postgres ExecScanHashBucket
>   3021162  50.0077  postgres ExecHashJoin
> 3020116  92.8440  postgres ExecScanHashBucket
>   3020116  49.9207  postgres ExecScanHashBucket [self]
>   3020116  49.9207  postgres ExecScanHashBucket
>   8190  0.1354  vmlinux  apic_timer_interrupt
>
> What could be an explanation for this?
> Is this known behaviour?
>

one issue was fixed in 9.5

large hash table can introduce a lot of outs from L1, L2 caches.

Pavel


>
> Yours,
> Laurenz Albe
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


[PERFORM] jsonb_agg performance

2016-01-29 Thread jfleming
The jsonb_agg function seems to have significantly worse performance
than its json_agg counterpart:

=> explain analyze select pa.product_id, jsonb_agg(attributes) from
product_attributes2 pa group by pa.product_id;
 
QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual
time=28.632..241.647 rows=3046 loops=1)
   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual
time=28.526..32.826 rows=8800 loops=1)
 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa 
(cost=0.00..551.00 rows=8800 width=380) (actual time=0.010..7.231
rows=8800 loops=1)
 Planning time: 0.376 ms
 Execution time: 242.963 ms
(8 rows)

=> explain analyze select pa.product_id, json_agg(attributes) from
product_attributes3 pa group by pa.product_id;
 
QUERY PLAN
--
 GroupAggregate  (cost=1136.54..1240.62 rows=3046 width=387) (actual
time=17.731..30.126 rows=3046 loops=1)
   Group Key: product_id
   ->  Sort  (cost=1136.54..1158.54 rows=8800 width=387) (actual
time=17.707..20.705 rows=8800 loops=1)
 Sort Key: product_id
 Sort Method: external sort  Disk: 3416kB
 ->  Seq Scan on product_attributes3 pa 
(cost=0.00..560.00 rows=8800 width=387) (actual time=0.006.5.568
rows=8800 loops=1)
 Planning time: 0.181 ms
 Execution time: 31.276 ms
(8 rows)

The only difference between the two tables is the type of the
attributes column (jsonb vs json).  Each table contains the same 8800
rows.  Even running json_agg on the jsonb column seems to be faster:

=> explain analyze select pa.product_id, json_agg(attributes) from
product_attributes2 pa group by pa.product_id;
 
QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual
time=30.626..62.943 rows=3046 loops=1)
   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual
time=30.590..34.157 rows=8800 loops=1)
 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa 
(cost=0.00..551.00 rows=8800 width=380) (actual time=0.014..7.388
rows=8800 loops=1)
 Planning time: 0.142 ms
 Execution time: 64.504 ms
(8 rows)

Is it expected that jsonb_agg performance would be that much worse
than json_agg?


Re: [PERFORM] jsonb_agg performance

2016-01-29 Thread Andrew Dunstan



On 01/29/2016 05:06 PM, jflem...@kispring.com wrote:
The jsonb_agg function seems to have significantly worse performance 
than its json_agg counterpart:


=> explain analyze select pa.product_id, jsonb_agg(attributes) from 
product_attributes2 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual 
time=28.632..241.647 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual 
time=28.526..32.826 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa (cost=0.00..551.00 
rows=8800 width=380) (actual time=0.010..7.231 rows=8800 loops=1)

 Planning time: 0.376 ms
 Execution time: 242.963 ms
(8 rows)

=> explain analyze select pa.product_id, json_agg(attributes) from 
product_attributes3 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1136.54..1240.62 rows=3046 width=387) (actual 
time=17.731..30.126 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1136.54..1158.54 rows=8800 width=387) (actual 
time=17.707..20.705 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3416kB
 ->  Seq Scan on product_attributes3 pa (cost=0.00..560.00 
rows=8800 width=387) (actual time=0.006..5.568 rows=8800 loops=1)

 Planning time: 0.181 ms
 Execution time: 31.276 ms
(8 rows)

The only difference between the two tables is the type of the 
attributes column (jsonb vs json).  Each table contains the same 8800 
rows.  Even running json_agg on the jsonb column seems to be faster:


=> explain analyze select pa.product_id, json_agg(attributes) from 
product_attributes2 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual 
time=30.626..62.943 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual 
time=30.590..34.157 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa (cost=000..551.00 
rows=8800 width=380) (actual time=0.014..7.388 rows=8800 loops=1)

 Planning time: 0.142 ms
 Execution time: 64.504 ms
(8 rows)

Is it expected that jsonb_agg performance would be that much worse 
than json_agg? 



I do expect it to be significantly worse. Constructing jsonb is quite a 
lot more expensive than constructing json, it's the later processing 
that provides the performance benefit of jsonb. For 99 out of 100 uses 
that I have seen there is no need to be using jsonb_agg, since the 
output is almost always fed straight back to the client, not stored or 
processed further in the database. Rendering json to the client is 
extremely cheap, since it's already just text. Rendering jsonb as text 
to the client involves a lot more processing.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance