[PERFORM] Hash join gets slower as work_mem increases?
I have a query that runs *slower* if I increase work_mem. The execution plans are identical in both cases, except that a temp file is used when work_mem is smaller. The relevant lines of EXPLAIN ANALYZE output are: With work_mem='100MB': -> Hash Join (cost=46738.74..285400.61 rows=292 width=8) (actual time=4296.986..106087.683 rows=187222 loops=1) Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id) Buffers: shared hit=1181177 dirtied=1, temp read=7232 written=7230 With work_mem='500MB': -> Hash Join (cost=46738.74..285400.61 rows=292 width=8) (actual time=3802.849..245970.049 rows=187222 loops=1) Hash Cond: ("*SELECT* 1_2".postadresse_id = p.postadresse_id) Buffers: shared hit=1181175 dirtied=111 I ran operf on both backends, and they look quite similar, except that the number of samples is different (this is "opreport -c" output): CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 9 samples %image name symbol name --- 112 0.0019 postgres ExecProcNode 3020116 49.9904 postgres ExecScanHashBucket 3021162 50.0077 postgres ExecHashJoin 3020116 92.8440 postgres ExecScanHashBucket 3020116 49.9207 postgres ExecScanHashBucket [self] 3020116 49.9207 postgres ExecScanHashBucket 8190 0.1354 vmlinux apic_timer_interrupt What could be an explanation for this? Is this known behaviour? Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash join gets slower as work_mem increases?
Hi > I ran operf on both backends, and they look quite similar, except that the > number of samples is different (this is "opreport -c" output): > > CPU: Intel Sandy Bridge microarchitecture, speed 2899.8 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit > mask of 0x00 (No unit mask) count 9 > samples %image name symbol name > > --- > 112 0.0019 postgres ExecProcNode > 3020116 49.9904 postgres ExecScanHashBucket > 3021162 50.0077 postgres ExecHashJoin > 3020116 92.8440 postgres ExecScanHashBucket > 3020116 49.9207 postgres ExecScanHashBucket [self] > 3020116 49.9207 postgres ExecScanHashBucket > 8190 0.1354 vmlinux apic_timer_interrupt > > What could be an explanation for this? > Is this known behaviour? > one issue was fixed in 9.5 large hash table can introduce a lot of outs from L1, L2 caches. Pavel > > Yours, > Laurenz Albe > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
[PERFORM] jsonb_agg performance
The jsonb_agg function seems to have significantly worse performance than its json_agg counterpart: => explain analyze select pa.product_id, jsonb_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=28.632..241.647 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=28.526..32.826 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=0.00..551.00 rows=8800 width=380) (actual time=0.010..7.231 rows=8800 loops=1) Planning time: 0.376 ms Execution time: 242.963 ms (8 rows) => explain analyze select pa.product_id, json_agg(attributes) from product_attributes3 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1136.54..1240.62 rows=3046 width=387) (actual time=17.731..30.126 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1136.54..1158.54 rows=8800 width=387) (actual time=17.707..20.705 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3416kB -> Seq Scan on product_attributes3 pa (cost=0.00..560.00 rows=8800 width=387) (actual time=0.006.5.568 rows=8800 loops=1) Planning time: 0.181 ms Execution time: 31.276 ms (8 rows) The only difference between the two tables is the type of the attributes column (jsonb vs json). Each table contains the same 8800 rows. Even running json_agg on the jsonb column seems to be faster: => explain analyze select pa.product_id, json_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=30.626..62.943 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=30.590..34.157 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=0.00..551.00 rows=8800 width=380) (actual time=0.014..7.388 rows=8800 loops=1) Planning time: 0.142 ms Execution time: 64.504 ms (8 rows) Is it expected that jsonb_agg performance would be that much worse than json_agg?
Re: [PERFORM] jsonb_agg performance
On 01/29/2016 05:06 PM, jflem...@kispring.com wrote: The jsonb_agg function seems to have significantly worse performance than its json_agg counterpart: => explain analyze select pa.product_id, jsonb_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=28.632..241.647 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=28.526..32.826 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=0.00..551.00 rows=8800 width=380) (actual time=0.010..7.231 rows=8800 loops=1) Planning time: 0.376 ms Execution time: 242.963 ms (8 rows) => explain analyze select pa.product_id, json_agg(attributes) from product_attributes3 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1136.54..1240.62 rows=3046 width=387) (actual time=17.731..30.126 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1136.54..1158.54 rows=8800 width=387) (actual time=17.707..20.705 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3416kB -> Seq Scan on product_attributes3 pa (cost=0.00..560.00 rows=8800 width=387) (actual time=0.006..5.568 rows=8800 loops=1) Planning time: 0.181 ms Execution time: 31.276 ms (8 rows) The only difference between the two tables is the type of the attributes column (jsonb vs json). Each table contains the same 8800 rows. Even running json_agg on the jsonb column seems to be faster: => explain analyze select pa.product_id, json_agg(attributes) from product_attributes2 pa group by pa.product_id; QUERY PLAN -- GroupAggregate (cost=1127.54..1231.62 rows=3046 width=380) (actual time=30.626..62.943 rows=3046 loops=1) Group Key: product_id -> Sort (cost=1127.54..1149.54 rows=8800 width=380) (actual time=30.590..34.157 rows=8800 loops=1) Sort Key: product_id Sort Method: external sort Disk: 3360kB -> Seq Scan on product_attributes2 pa (cost=000..551.00 rows=8800 width=380) (actual time=0.014..7.388 rows=8800 loops=1) Planning time: 0.142 ms Execution time: 64.504 ms (8 rows) Is it expected that jsonb_agg performance would be that much worse than json_agg? I do expect it to be significantly worse. Constructing jsonb is quite a lot more expensive than constructing json, it's the later processing that provides the performance benefit of jsonb. For 99 out of 100 uses that I have seen there is no need to be using jsonb_agg, since the output is almost always fed straight back to the client, not stored or processed further in the database. Rendering json to the client is extremely cheap, since it's already just text. Rendering jsonb as text to the client involves a lot more processing. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance