Graeme, robe2 and I were discussing this thread and we were wondering if using the ~= operator would work for your problem.
http://www.postgis.net/docs/manual-2.0/ST_Geometry_Same.html -bborie On Mon, Jun 17, 2013 at 7:22 AM, Graeme B. Bell <[email protected]> wrote: > > Hello bborie (and everyone else). > > To recap. I'm trying to do some work with 4 gigantic raster tables which have > been set up in the same way (e.g. georeferenced/tiled identically). I want to > compare equivalent tiles in each raster with one another. Each raster has 7 > million rows now but I'll be using 28 million row versions shortly. > > I wanted to quickly and reliably match up corresponding raster tiles between > the raster tables, but I wasn't confident that the geometry description for > each rid key would always be *exactly* alike between different imports of > rasters via raster2pgsql. > > bborie proposed: > >> A.rast::geometry = B.rast::geometry will check for spatially equal >> tiles. This check will be faster on out-db vs in-db as there's less data >> for postgres to load regardless of your SQL. >> >> If you're asking for a canned SQL query... >> >> SELECT >> SOMEWORK >> FROM A >> JOIN B >> ON A.rast::geometry = B.rast::geometry >> >> -bborie > > which is an elegant approach, but unfortunately does not seem practical. The > problem is that the geometry has to be generated by the cast and it's not an > indexed column. This makes it a very slow process to build the join. > > I tested, and regardless of the type of join I tried, the cost of the join > grows as the square of the number of rows. > > n=100: 0.3s > n=1000: 26s > n=2000: 105s > > At that rate, n=28 million is going to be a big problem. A possible > workaround might be to precalculate and explicitly store some geometry for > every raster tile, index it, and then do the comparison. That would be a bit > slow and space-consuming, but is viable. > > Preferably it would be nicer to join on the RID column, which is a simple int > and is already indexed. But how can I be confident that the rid columns and > geometries will line up between the raster rows of each table? > > > B0. Common sense - "they ought to", in principle; there is no reason why they > shouldn't other than algorithm instability. > > > B1. Pick out some rows and check visually in QGIS how the numbering works, > and that the geometry appears to match. > >> select a.rid, a.rast::geometry into temp.test2 from temp.a as a order by rid >> desc limit 1000; > > > B2. Check that the highest rid value matches the number of rows in the table > (e.g. 'there is no room for gaps in the rid sequences') > >> select rid from temp.b order by rid desc limit 10; >> select rid from temp.b order by rid asc limit 10; > ... > > B3. Select out the start and the end of the raster tables and ensure that > when joined on RID, the geometries match. > >> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) >> where st_equals(a.rast::geometry,b.rast::geometry) order by rid desc limit >> 10; > >> select a.rid as arid, b.rid as brid from temp.a join temp.b using (rid) >> where st_equals(a.rast::geometry,b.rast::geometry) order by rid asc limit 10; > > B4. You could also sample 1000's of random rids within the tables and check > they line up between rasters, to gain confidence that the ordering is > maintained throughout the data. > > > in short - if the rasters were given in the same coordinate > system/scale/tiling, and if the start of the raster tables line up, and the > ends line up, if the rid ordering looks simple/predictable, and if there is > no gap in the number sequence along the way, then it seems reasonable to > trust the rid as a proxy for the geometry. > > This seems to work well. Using a join on 'rid' across the 4 tables is > extremely fast because it is a simple integer and it is already implicitly > indexed. I was able to complete the entire analysis (including checking > st_value() across the raster) in less than an hour. The results look as > expected. > > I would recommend using rid in this way as a proxy for geometry to anyone > else doing analysis between large rasters that can be naturally placed into > an identical coordinate system, tiling, position and so on. But if you are > doing this, remember to run tests B0-B4 to gain some confidence that your > rids really are a safe proxy. > > select ... > from a join b using (rid) join c using (rid) join d using (rid) ... (join > x,y sequence columns) > where ... (st_value stuff on pixels) > > Hope someone finds this interesting. > > Graeme > > _______________________________________________ > postgis-users mailing list > [email protected] > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
