Hello bborie (and everyone else). 

To recap. I'm trying to do some work with 4 gigantic raster tables which have 
been set up in the same way (e.g. georeferenced/tiled identically). I want to 
compare equivalent tiles in each raster with one another. Each raster has 7 
million rows now but I'll be using 28 million row versions shortly. 

I wanted to quickly and reliably match up corresponding raster tiles between 
the raster tables, but I wasn't confident that the geometry description for 
each rid key would always be *exactly* alike between different imports of 
rasters via raster2pgsql. 

bborie proposed: 

> A.rast::geometry = B.rast::geometry will check for spatially equal
> tiles. This check will be faster on out-db vs in-db as there's less data
> for postgres to load regardless of your SQL.
> 
> If you're asking for a canned SQL query...
> 
> SELECT
>       SOMEWORK
> FROM A
> JOIN B
>       ON A.rast::geometry = B.rast::geometry
> 
> -bborie

which is an elegant approach, but unfortunately does not seem practical. The 
problem is that the geometry has to be generated by the cast and it's not an 
indexed column. This makes it a very slow process to build the join. 

I tested, and regardless of the type of join I tried, the cost of the join 
grows as the square of the number of rows. 

n=100: 0.3s
n=1000: 26s
n=2000: 105s

At that rate, n=28 million is going to be a big problem. A possible workaround 
might be to precalculate and explicitly store some geometry for every raster 
tile, index it, and then do the comparison. That would be a bit slow and 
space-consuming, but is viable.

Preferably it would be nicer to join on the RID column, which is a simple int 
and is already indexed. But how can I be confident that the rid columns and 
geometries will line up between the raster rows of each table?


B0. Common sense - "they ought to", in principle; there is no reason why they 
shouldn't other than algorithm instability. 


B1. Pick out some rows and check visually in QGIS how the numbering works, and 
that the geometry appears to match.

> select a.rid, a.rast::geometry into temp.test2 from temp.a as a order by rid 
> desc limit 1000;


B2. Check that the highest rid value matches the number of rows in the table 
(e.g. 'there is no room for gaps in the rid sequences')

> select rid from temp.b order by rid desc limit 10;
> select rid from temp.b order by rid asc limit 10;
...

B3. Select out the start and the end of the raster tables and ensure that when 
joined on RID, the geometries match. 

> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) where 
> st_equals(a.rast::geometry,b.rast::geometry) order by rid desc limit 10; 

> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) where 
> st_equals(a.rast::geometry,b.rast::geometry) order by rid asc limit 10; 

B4. You could also sample 1000's of random rids within the tables and check 
they line up between rasters, to gain confidence that the ordering is 
maintained throughout the data.


in short - if the rasters were given in the same coordinate 
system/scale/tiling, and if the start of the raster tables line up, and the 
ends line up, if the rid ordering looks simple/predictable, and if there is no 
gap in the number sequence along the way, then it seems reasonable to trust the 
rid as a proxy for the geometry.

This seems to work well. Using a join on 'rid' across the 4 tables is extremely 
fast because it is a simple integer and it is already implicitly indexed. I was 
able to complete the entire analysis (including checking st_value() across the 
raster) in less than an hour.  The results look as expected.

I would recommend using rid in this way as a proxy for geometry to anyone else 
doing analysis between large rasters that can be naturally placed into an 
identical coordinate system, tiling, position and so on. But if you are doing 
this, remember to run tests B0-B4 to gain some confidence that your rids really 
are a safe proxy. 

select ...
  from a join b using (rid) join c using (rid) join d using (rid) ... (join x,y 
sequence columns)
  where ... (st_value stuff on pixels)

Hope someone finds this interesting.

Graeme

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to