Hi, hackers! I've adapted crossmatch join from pgSphere to cube for performance tests. I've placed spatial join code here https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/spatialjoin.c and node code here https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/joinnode.c
If you have an idea of improving the performance of this code, please do not hesitate to express them. One of the performance bottlenecks is code nearby heap_getattr() in fetch_next_pair(). ==Performance Tests== I've tested performance on queries which aggregate result of the spatial join. See cube_test.sql attached. On 3d data, Spatial Join performs 3x faster than Nested Loop Join + GiST Scan Nested Loop + Index Scan plan: HashAggregate (cost=36841568.00..36841570.00 rows=200 width=40) (actual time=206565.869..206738.307 rows=298731 loops=1) Group Key: r.nx -> Nested Loop (cost=0.41..25591568.00 rows=900000000 width=40) (actual time=0.357..200838.416 rows=8464147 loops=1) -> Seq Scan on regions r (cost=0.00..6410.00 rows=300000 width=40) (actual time=0.015..324.436 rows=300000 loops=1) -> Index Scan using idx on datatable a (cost=0.41..55.28 rows=3000 width=64) (actual time=0.174..0.648 rows=28 loops=300000) Index Cond: (r.c @> c) Planning time: 17.175 ms Execution time: 206806.926 ms Time: 206852,635 ms (03:26,853) Spatial Join plan: HashAggregate (cost=56250001.00..56250003.00 rows=200 width=40) (actual time=67373.644..67553.118 rows=298731 loops=1) Group Key: r.nx -> Custom Scan (SpatialJoin) (cost=0.00..1.00 rows=4500000000 width=40) (actual time=0.151..61718.804 rows=8464147 loops=1) Outer index: idx Inner index: idx1 Planning time: 0.182 ms Execution time: 67630.742 ms Time: 67631,557 ms (01:07,632) But on more realistic 7D data with queries emulating OLAP system performance of Spatial Join is 2 times worse than Nested Loop Join + GiST Scan. Which comes as a complete surprise to me. I do not see any algorithmic reason for Spatial Join to be slower. Thus I strongly suspect that my implementation is not efficient, but as for now I have no ideas how to improve it. Here are plans for 7D Nested Loop + Index Scan HashAggregate (cost=3425143.00..3425743.00 rows=60000 width=72) (actual time=122794.715..122822.893 rows=60000 loops=1) Group Key: r.nx -> Nested Loop (cost=0.41..2075143.00 rows=60000000 width=72) (actual time=0.311..100478.710 rows=39817008 loops=1) -> Seq Scan on r1 r (cost=0.00..2419.00 rows=60000 width=128) (actual time=0.043..60.579 rows=60000 loops=1) -> Index Scan using ix_a1_cube on a1 a (cost=0.41..24.55 rows=1000 width=128) (actual time=0.110..1.266 rows=664 loops=60000) Index Cond: (c <@ r.c) Planning time: 0.349 ms Execution time: 122831.353 ms (8 rows) Spatial Join HashAggregate (cost=6750001.00..6750601.00 rows=60000 width=72) (actual time=241832.855..241889.360 rows=60000 loops=1) Group Key: r.nx -> Custom Scan (SpatialJoin) (cost=0.00..1.00 rows=300000000 width=72) (actual time=0.140..216187.111 rows=39817008 loops=1) Outer index: ix_r1_cube Inner index: ix_a1_cube Planning time: 0.533 ms Execution time: 241907.569 ms (7 rows) Time: 241910,440 ms (04:01,910) Any ideas will be highly appreciated. Best regards, Andrey Borodin.
create extension if not exists cube; begin transaction; SELECT setseed(.43); \timing create table dataTable as select cube(array[random(),random(),random()]) c, 1::float as x1, 1::float as x2, 1::float as x3, 1::float as x4 from generate_series(1,3e6,1) s; create index idx on dataTable using gist(c); create table regions as select cube(array[x,y,z],array[x+0.1,y+0.01,z+0.01]) c, row_number() over() as nx from (select random() x,random() y, random() z from generate_series(1,3e5,1) s) q; create index idx1 on regions using gist(c); set max_parallel_workers_per_gather = 0; explain analyze select sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4 from dataTable a join regions r on r.c @> a.c group by r.nx; explain analyze select sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4 from dataTable a join regions r on r.c && a.c group by r.nx; commit;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers