Hi Regina,

I tried dropping the geometry cast, it makes no difference. Results below. But 
I found another solution which is rather nice... 


===== 


I thought I should redesign this test so that you can check it yourself rather 
than take my word for it. Unfortunately I can't share my dataset so I'm using 
an OSGeo dataset below. 


Results found:  

- RAST vs. RAST::GEOMETRY join performance was pretty much identical.  

- There is a pretty huge difference (here, 100000x worse performance) between 
joining on RID and joining on either RAST or RAST::GEOMETRY, if you're working 
between different raster data in the same tiling/georeferencing system. 

- You can use a b-tree on your rasters instead of GiST, if you don't mind a bit 
of a workaround. This lets you make comparisons/joins on the rast column 
metadata super-fast. Not sure how well it will scale on gigabyte datasets yet, 
but I'm hopeful. The only downside is that your rids will get returned out of 
order and you'll need to 'order by' manually.



Detail.

0a. Test machine - decently fast: 

Server is Xeon E3-1270v2, running single-threaded with no other load at 3.9Ghz 
turbo. Samsung 840 pro SSD 512GB as the postgres main disk, and 32GB of 1600Mhz 
ECC RAM.

0b. Postgres was configured for good performance using a couple of online 
guides and pgbench.

0c. PostGIS 2.0.3, postgres 9.2.4. 



1. ilatlon.tif is a georeferenced tif available at 
http://download.osgeo.org/geotiff/samples/usgs/ilatlon.tif, size 13MB. Since 
the size is only 13MB, data is going to be cached in memory I expect. 

2. Insert to create '4 different rasters in the same georeference/tiling 
system'.  
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l1 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l2 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l3 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l4 | psql sl

Enable timing. 

sl=# \timing

The set of tests was run twice and two timing results were recorded.

sl=# select l1.rid from temp.l1 left join temp.l2 on l1.rid=l2.rid left join 
temp.l3 on l1.rid=l3.rid left join temp.l4 on l1.rid=l4.rid;
Time: 1.366 ms
Time: 1.185 ms


sl=# select l1.rid from temp.l1 inner join temp.l2 on 
l1.rast::geometry~=l2.rast::geometry inner join temp.l3 on 
l1.rast::geometry~=l3.rast::geometry inner join temp.l4 on 
l1.rast::geometry~=l4.rast::geometry;
Time: 99406.689 ms
Time: 99298.702 ms


sl=# select l1.rid from temp.l1 inner join temp.l2 on l1.rast~=l2.rast inner 
join temp.l3 on l1.rast~=l3.rast inner join temp.l4 on l1.rast~=l4.rast;
Time: 99295.640 ms
Time: 99332.718 ms

Is it the join type? No. 

sl=# select l1.rid from temp.l1 left join temp.l2 on l1.rast~=l2.rast left join 
temp.l3 on l1.rast~=l3.rast left join temp.l4 on l1.rast~=l4.rast;
Time: 149493.271 ms
Time: 149601.305 ms

sl=# select l1.rid from temp.l1 right join temp.l2 on l1.rast~=l2.rast right 
join temp.l3 on l1.rast~=l3.rast right join temp.l4 on l1.rast~=l4.rast;
Time: 99271.954 ms
Time: 100857.511 ms


My thoughts:

a. 1 ms vs 100000ms on a 13MB TIFF is quite a big difference. It will be 
painful with multi-GB TIFFs with tens of millions of records.

b. GIST indices are not good for equality. If there was a b-tree method for 
rast, perhaps that would be better? 

unfortunately, you can't simply "create index X on rast", b-trees don't work 
with them; casts don't seem to work either. 

BUT... we could use a b-tree on a function that produces a string describing 
the raster tile metadata:

sl=# CREATE INDEX temp1_btree ON temp.l1 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp2_btree ON temp.l2 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp3_btree ON temp.l3 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp4_btree ON temp.l4 (ST_GeoReference(rast));
CREATE INDEX

sl=# select l1.rid from temp.l1 inner join temp.l2 on 
st_georeference(l1.rast)=st_georeference(l2.rast) inner join temp.l3 on 
st_georeference(l1.rast)=st_georeference(l3.rast) inner join temp.l4 on 
st_georeference(l1.rast)=st_georeference(l4.rast);

Time: 647.384 ms

Still 500x slower, but a quite reasonable improvement. 

Graeme





_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to