Graeme,

robe2 and I were discussing this thread and we were wondering if using
the ~= operator would work for your problem.

http://www.postgis.net/docs/manual-2.0/ST_Geometry_Same.html

-bborie

On Mon, Jun 17, 2013 at 7:22 AM, Graeme B. Bell <[email protected]> wrote:
>
> Hello bborie (and everyone else).
>
> To recap. I'm trying to do some work with 4 gigantic raster tables which have 
> been set up in the same way (e.g. georeferenced/tiled identically). I want to 
> compare equivalent tiles in each raster with one another. Each raster has 7 
> million rows now but I'll be using 28 million row versions shortly.
>
> I wanted to quickly and reliably match up corresponding raster tiles between 
> the raster tables, but I wasn't confident that the geometry description for 
> each rid key would always be *exactly* alike between different imports of 
> rasters via raster2pgsql.
>
> bborie proposed:
>
>> A.rast::geometry = B.rast::geometry will check for spatially equal
>> tiles. This check will be faster on out-db vs in-db as there's less data
>> for postgres to load regardless of your SQL.
>>
>> If you're asking for a canned SQL query...
>>
>> SELECT
>>       SOMEWORK
>> FROM A
>> JOIN B
>>       ON A.rast::geometry = B.rast::geometry
>>
>> -bborie
>
> which is an elegant approach, but unfortunately does not seem practical. The 
> problem is that the geometry has to be generated by the cast and it's not an 
> indexed column. This makes it a very slow process to build the join.
>
> I tested, and regardless of the type of join I tried, the cost of the join 
> grows as the square of the number of rows.
>
> n=100: 0.3s
> n=1000: 26s
> n=2000: 105s
>
> At that rate, n=28 million is going to be a big problem. A possible 
> workaround might be to precalculate and explicitly store some geometry for 
> every raster tile, index it, and then do the comparison. That would be a bit 
> slow and space-consuming, but is viable.
>
> Preferably it would be nicer to join on the RID column, which is a simple int 
> and is already indexed. But how can I be confident that the rid columns and 
> geometries will line up between the raster rows of each table?
>
>
> B0. Common sense - "they ought to", in principle; there is no reason why they 
> shouldn't other than algorithm instability.
>
>
> B1. Pick out some rows and check visually in QGIS how the numbering works, 
> and that the geometry appears to match.
>
>> select a.rid, a.rast::geometry into temp.test2 from temp.a as a order by rid 
>> desc limit 1000;
>
>
> B2. Check that the highest rid value matches the number of rows in the table 
> (e.g. 'there is no room for gaps in the rid sequences')
>
>> select rid from temp.b order by rid desc limit 10;
>> select rid from temp.b order by rid asc limit 10;
> ...
>
> B3. Select out the start and the end of the raster tables and ensure that 
> when joined on RID, the geometries match.
>
>> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) 
>> where st_equals(a.rast::geometry,b.rast::geometry) order by rid desc limit 
>> 10;
>
>> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) 
>> where st_equals(a.rast::geometry,b.rast::geometry) order by rid asc limit 10;
>
> B4. You could also sample 1000's of random rids within the tables and check 
> they line up between rasters, to gain confidence that the ordering is 
> maintained throughout the data.
>
>
> in short - if the rasters were given in the same coordinate 
> system/scale/tiling, and if the start of the raster tables line up, and the 
> ends line up, if the rid ordering looks simple/predictable, and if there is 
> no gap in the number sequence along the way, then it seems reasonable to 
> trust the rid as a proxy for the geometry.
>
> This seems to work well. Using a join on 'rid' across the 4 tables is 
> extremely fast because it is a simple integer and it is already implicitly 
> indexed. I was able to complete the entire analysis (including checking 
> st_value() across the raster) in less than an hour.  The results look as 
> expected.
>
> I would recommend using rid in this way as a proxy for geometry to anyone 
> else doing analysis between large rasters that can be naturally placed into 
> an identical coordinate system, tiling, position and so on. But if you are 
> doing this, remember to run tests B0-B4 to gain some confidence that your 
> rids really are a safe proxy.
>
> select ...
>   from a join b using (rid) join c using (rid) join d using (rid) ... (join 
> x,y sequence columns)
>   where ... (st_value stuff on pixels)
>
> Hope someone finds this interesting.
>
> Graeme
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to