Have you considered storing your rasters outside the database with raster2pgsql's -R flag? I usually recommend this if the rasters are readonly. This also minimizes the amount of information stored within the database to just the metadata of the out-db rasters.
-bborie On Wed, Jun 12, 2013 at 5:23 AM, Graeme B. Bell <[email protected]> wrote: > Hello everyone, > > > Short version: > > I have some identically specified/tiled rasters that differ only in terms of > the data values in the pixels and table name*. > > I want to find a quick, easy and 100% trustworthy test of tile position > equivalence between the tiles of several raster tables, so that I can match > up the pixels in corresponding tiles for mathematical operations. > > Is there a quick and easy way to do this? > > * in theory. > > > Painfully long version: > > I have some big rasters, around 18 to 72 giga-pixels each. The rasters have a > single band containing 8-bit values. They are specified identically in terms > of tiling, georeferencing, skew, scale, pixel size etc. The only difference > is in the contents of the pixels. > > I've been processing these with GDAL and Python I'm getting very nice speeds, > e.g. <60 minutes to perform calculations using several rasters together. > > I've imported my rasters (A,B,C,D) into PG raster using 100x100 tiles, and > I'm trying some analysis in postgis. This is deliberate; I could do the > analysis in gdalcalc very easily, but I want to make it possible for database > users to do this kind of work. > > I hit a stumbling block when it comes to carrying out aligned calculations > between pixels. I'm basically carrying out an operation over every pixel in D > and comparing it with the corresponding pixels in A, B, and C. > > Logically, what I want to say is: > > - for each 'rast' tile in raster table D > - for each pixel of that tile (x=1..100, y=1..100) > - get the corresponding pixel value from the corresponding tile in > rasters A,B and C and do some work > > The problem arises around the word 'corresponding' and the tradeoff between > accuracy, speed and query elegance/debuggability. > > To my mind, if the rasters are specified identically and the import routines > are stable (e.g. always producing the same metadata for the same raster > tiles), then it should be possible to pick out the identical tiles between > rasters. > > > Possible approaches: > > 1. Bounding box intersection. If you try bounding box intersection to > identify corresponding tiles, it's suitably fast: > >> where D.rast && A.rast and D.rast && B.rast and D.rast && C.rast > > Unfortunately, you find that the bounding boxes of the neighbouring tiles in > all directions (8-neighbours) are also matched for && intersection. > > The consquence is that you get 256 rows where you want to have 1. I've > checked this is what is happening by looking at the returned rids. I expect > this is happening because of the BBox of a tile is slightly larger than the > tile, by definition. > > > 2. Make a multi-band raster. One solution would be to merge the rasters > outside postgis into a multi-band aligned raster and import that (or do the > same inside postgis raster using e.g. addband). > > I don't want to do that because a) the data is big, copies are bad and b) > the data is logically distinct, so I want to store and address it using > logically distinct raster tables. > > > 3. Address raster pixels indirectly using geography. I could iterate across > the rasters using geographical points on the landscape that I've calculated > to correspond to the center of pixels on the rasters. However, for this > problem, please assume that I want to index the polygon tiles/pixels > *directly* between corresponding tiles, rather than *indirectly*. I suspect > it would be rather slow to do things this way, too. > > > 4. Identify exactly identical bounding boxes. Checking for equality between > st_envelopes of tiles works: > >> where st_envelope(D.rast) = st_envelope(C.rast) and st_envelope(D.rast) = >> st_envelope(B.rast) and st_envelope(D.rast) = st_envelope(A.rast) > > but it is very slow - many many times slower than the && method. Also, I > expect the st_envelope generation algorithm is stable and will always make > exactly the same output for a given tile across each raster, but I can't be > 100% certain without taking a close look at the source code of postgis. > > > 5. I can use matching rids (since the raster tiles get numbered identically > in practice from identically specified source tiles), > >> where D.rid=A.rid and D.rid=B.rid and D.rid=C.rid > > This is ideal - super fast, quick to write - but it seems there might be a > theoretical risk that the rids might not be numbered in the order I expect > when imported through raster2pgsql. How real is that risk? > > > 6. I could compare ST_metadata equality between tiles. This also has the > advantage that if the tiles aren't set up identically, it will fail horribly. > It seems there is a lot of data being compared (skew etc) which I know for > certain matches between the tiles. > >> where st_metadata(D.rast) = st_metadata(C.rast) and st_metadata(D.rast) = >> st_metadata(B.rast) and st_metadata(D.rast) = st_metadata(A.rast) > > This is very slow. > > > 7. Knowing the tiles are aligned and equivalent, I could compare only the > distinct components of the tile metadata, e.g. the topleft pixels > (ST_Raster2WorldCoordX, ST_Raster2WorldCoordY). > >> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(C.rast,1) and >> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(B.rast,1) and >> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(A.rast,1) and > st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(C.rast,1) and > st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(B.rast,1) and > st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(A.rast,1) > > This is fast and close to ideal, but it involves quite a bit of text compared > to the cute && syntax. I don't mind typing, but it's harder to spot errors. > > > 8. I could precalculate a key column over my raster tiles that allows me > identify identically positioned tiles quickly - e.g. the same purpose as > 'rid', but without the worry that the sequence could be out of order > somewhere because of unknown gremlins in the serial generation process of > postgres/raster2pgsql. > > > 9. So.. is there another way of detecting tile equivalence that is: > > - 'short' e.g. visibly concise, like A.rast && D.rast > - computationally fast > - robust, i.e. not likely to generate 1 missing tile among millions. > > > I suppose what I am getting towards is: > >> where D.rast ~~ A.rast and D.rast ~~ B.rast and D.rast ~~ C.rast > > where ~~ is some computationally fast and trustworthy operator to pick out > tiles in matching positions in equivalent rasters. > > > > Graeme. > > _______________________________________________ > postgis-users mailing list > [email protected] > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
