Have you considered storing your rasters outside the database with
raster2pgsql's -R flag? I usually recommend this if the rasters are
readonly. This also minimizes the amount of information stored within
the database to just the metadata of the out-db rasters.

-bborie

On Wed, Jun 12, 2013 at 5:23 AM, Graeme B. Bell <[email protected]> wrote:
> Hello everyone,
>
>
> Short version:
>
> I have some identically specified/tiled rasters that differ only in terms of 
> the data values in the pixels and table name*.
>
> I want to find a quick, easy and 100% trustworthy test of tile position 
> equivalence between the tiles of several raster tables, so that I can match 
> up the pixels in corresponding tiles for mathematical operations.
>
> Is there a quick and easy way to do this?
>
> * in theory.
>
>
> Painfully long version:
>
> I have some big rasters, around 18 to 72 giga-pixels each. The rasters have a 
> single band containing 8-bit values. They are specified identically in terms 
> of tiling, georeferencing, skew, scale, pixel size etc. The only difference 
> is in the contents of the pixels.
>
> I've been processing these with GDAL and Python I'm getting very nice speeds, 
> e.g.  <60 minutes to perform calculations using several rasters together.
>
> I've imported my rasters (A,B,C,D) into PG raster using 100x100 tiles, and 
> I'm trying some analysis in postgis. This is deliberate; I could  do the 
> analysis in gdalcalc very easily, but I want to make it possible for database 
> users to do this kind of work.
>
> I hit a stumbling block when it comes to carrying out aligned calculations 
> between pixels. I'm basically carrying out an operation over every pixel in D 
> and comparing it with the corresponding pixels in A, B, and C.
>
> Logically, what I want to say is:
>
> - for each 'rast' tile in raster table D
>   - for each pixel of that tile (x=1..100, y=1..100)
>     - get the corresponding pixel value from the corresponding tile in 
> rasters A,B and C and do some work
>
> The problem arises around the word 'corresponding' and the tradeoff between 
> accuracy, speed and query elegance/debuggability.
>
> To my mind, if the rasters are specified identically and the import routines 
> are stable (e.g. always producing the same metadata for the same raster 
> tiles), then it should be possible to pick out the identical tiles between 
> rasters.
>
>
> Possible approaches:
>
> 1. Bounding box intersection.  If you try bounding box intersection to 
> identify corresponding tiles, it's suitably fast:
>
>>    where D.rast && A.rast and D.rast && B.rast and D.rast && C.rast
>
> Unfortunately, you find that the bounding boxes of the neighbouring tiles in 
> all directions (8-neighbours) are also matched for && intersection.
>
> The consquence is that you get 256 rows where you want to have 1. I've 
> checked this is what is happening by looking at the returned rids. I expect 
> this is happening because of the BBox of a tile is slightly larger than the 
> tile, by definition.
>
>
> 2. Make a multi-band raster.  One solution would be to merge the rasters 
> outside postgis into a multi-band aligned raster and import that (or do the 
> same inside postgis raster using e.g. addband).
>
> I don't want to do that because a) the data is big, copies are bad  and  b) 
> the data is logically distinct, so I want to store and address it using 
> logically distinct raster tables.
>
>
> 3.  Address raster pixels indirectly using geography.  I could iterate across 
> the rasters using geographical points on the landscape that I've calculated 
> to correspond to the center of pixels on the rasters. However, for this 
> problem, please assume that I want to index the polygon tiles/pixels 
> *directly* between corresponding tiles, rather than *indirectly*. I suspect 
> it would be rather slow to do things this way, too.
>
>
> 4. Identify exactly identical bounding boxes. Checking for equality between 
> st_envelopes of tiles works:
>
>>    where st_envelope(D.rast) = st_envelope(C.rast) and st_envelope(D.rast) = 
>> st_envelope(B.rast) and st_envelope(D.rast) = st_envelope(A.rast)
>
> but it is very slow - many many times slower than the && method. Also, I 
> expect the st_envelope generation algorithm is stable and will always make 
> exactly the same output for a given tile across each raster, but I can't be 
> 100% certain without taking a close look at the source code of postgis.
>
>
> 5. I can use matching rids (since the raster tiles get numbered identically 
> in practice from identically specified source tiles),
>
>> where D.rid=A.rid and D.rid=B.rid and D.rid=C.rid
>
> This is ideal - super fast, quick to write - but it seems there might be a 
> theoretical risk that the rids might not be numbered in the order I expect 
> when imported through raster2pgsql. How real is that risk?
>
>
> 6. I could compare ST_metadata equality between tiles. This also has the 
> advantage that if the tiles aren't set up identically, it will fail horribly. 
> It seems there is a lot of data being compared (skew etc) which I know for 
> certain matches between the tiles.
>
>> where st_metadata(D.rast) = st_metadata(C.rast) and st_metadata(D.rast) = 
>> st_metadata(B.rast) and st_metadata(D.rast) = st_metadata(A.rast)
>
> This is very slow.
>
>
> 7. Knowing the tiles are aligned and equivalent, I could compare only the 
> distinct components of the tile metadata, e.g. the topleft pixels 
> (ST_Raster2WorldCoordX, ST_Raster2WorldCoordY).
>
>> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(C.rast,1) and 
>> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(B.rast,1) and 
>> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(A.rast,1) and
> st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(C.rast,1) and 
> st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(B.rast,1) and 
> st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(A.rast,1)
>
> This is fast and close to ideal, but it involves quite a bit of text compared 
> to the cute && syntax. I don't mind typing, but it's harder to spot errors.
>
>
> 8. I could precalculate a key column over my raster tiles that allows me 
> identify identically positioned tiles quickly - e.g.  the same purpose as 
> 'rid', but without the worry that the sequence could be out of order 
> somewhere because of unknown gremlins in the serial generation process of 
> postgres/raster2pgsql.
>
>
> 9. So.. is there another way of detecting tile equivalence that is:
>
> - 'short' e.g. visibly concise, like   A.rast && D.rast
> - computationally fast
> - robust, i.e. not likely to generate 1 missing tile among millions.
>
>
> I suppose what I am getting towards is:
>
>>  where D.rast ~~ A.rast and D.rast ~~ B.rast and D.rast ~~ C.rast
>
> where ~~ is some computationally fast and trustworthy operator to pick out 
> tiles in matching positions in equivalent rasters.
>
>
>
> Graeme.
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to