Hello everyone,

Short version:

I have some identically specified/tiled rasters that differ only in terms of 
the data values in the pixels and table name*. 

I want to find a quick, easy and 100% trustworthy test of tile position 
equivalence between the tiles of several raster tables, so that I can match up 
the pixels in corresponding tiles for mathematical operations. 

Is there a quick and easy way to do this?

* in theory. 


Painfully long version: 

I have some big rasters, around 18 to 72 giga-pixels each. The rasters have a 
single band containing 8-bit values. They are specified identically in terms of 
tiling, georeferencing, skew, scale, pixel size etc. The only difference is in 
the contents of the pixels. 

I've been processing these with GDAL and Python I'm getting very nice speeds, 
e.g.  <60 minutes to perform calculations using several rasters together.

I've imported my rasters (A,B,C,D) into PG raster using 100x100 tiles, and I'm 
trying some analysis in postgis. This is deliberate; I could  do the analysis 
in gdalcalc very easily, but I want to make it possible for database users to 
do this kind of work. 

I hit a stumbling block when it comes to carrying out aligned calculations 
between pixels. I'm basically carrying out an operation over every pixel in D 
and comparing it with the corresponding pixels in A, B, and C. 

Logically, what I want to say is:

- for each 'rast' tile in raster table D
  - for each pixel of that tile (x=1..100, y=1..100)
    - get the corresponding pixel value from the corresponding tile in rasters 
A,B and C and do some work

The problem arises around the word 'corresponding' and the tradeoff between 
accuracy, speed and query elegance/debuggability. 

To my mind, if the rasters are specified identically and the import routines 
are stable (e.g. always producing the same metadata for the same raster tiles), 
then it should be possible to pick out the identical tiles between rasters.  


Possible approaches: 

1. Bounding box intersection.  If you try bounding box intersection to identify 
corresponding tiles, it's suitably fast:

>    where D.rast && A.rast and D.rast && B.rast and D.rast && C.rast

Unfortunately, you find that the bounding boxes of the neighbouring tiles in 
all directions (8-neighbours) are also matched for && intersection.

The consquence is that you get 256 rows where you want to have 1. I've checked 
this is what is happening by looking at the returned rids. I expect this is 
happening because of the BBox of a tile is slightly larger than the tile, by 
definition. 


2. Make a multi-band raster.  One solution would be to merge the rasters 
outside postgis into a multi-band aligned raster and import that (or do the 
same inside postgis raster using e.g. addband). 

I don't want to do that because a) the data is big, copies are bad  and  b) the 
data is logically distinct, so I want to store and address it using logically 
distinct raster tables. 


3.  Address raster pixels indirectly using geography.  I could iterate across 
the rasters using geographical points on the landscape that I've calculated to 
correspond to the center of pixels on the rasters. However, for this problem, 
please assume that I want to index the polygon tiles/pixels *directly* between 
corresponding tiles, rather than *indirectly*. I suspect it would be rather 
slow to do things this way, too. 


4. Identify exactly identical bounding boxes. Checking for equality between 
st_envelopes of tiles works: 

>    where st_envelope(D.rast) = st_envelope(C.rast) and st_envelope(D.rast) = 
> st_envelope(B.rast) and st_envelope(D.rast) = st_envelope(A.rast) 

but it is very slow - many many times slower than the && method. Also, I expect 
the st_envelope generation algorithm is stable and will always make exactly the 
same output for a given tile across each raster, but I can't be 100% certain 
without taking a close look at the source code of postgis. 


5. I can use matching rids (since the raster tiles get numbered identically in 
practice from identically specified source tiles), 

> where D.rid=A.rid and D.rid=B.rid and D.rid=C.rid

This is ideal - super fast, quick to write - but it seems there might be a 
theoretical risk that the rids might not be numbered in the order I expect when 
imported through raster2pgsql. How real is that risk?


6. I could compare ST_metadata equality between tiles. This also has the 
advantage that if the tiles aren't set up identically, it will fail horribly. 
It seems there is a lot of data being compared (skew etc) which I know for 
certain matches between the tiles. 

> where st_metadata(D.rast) = st_metadata(C.rast) and st_metadata(D.rast) = 
> st_metadata(B.rast) and st_metadata(D.rast) = st_metadata(A.rast)

This is very slow. 


7. Knowing the tiles are aligned and equivalent, I could compare only the 
distinct components of the tile metadata, e.g. the topleft pixels 
(ST_Raster2WorldCoordX, ST_Raster2WorldCoordY).

> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(C.rast,1) and 
> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(B.rast,1) and 
> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(A.rast,1) and
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(C.rast,1) and 
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(B.rast,1) and 
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(A.rast,1)

This is fast and close to ideal, but it involves quite a bit of text compared 
to the cute && syntax. I don't mind typing, but it's harder to spot errors.


8. I could precalculate a key column over my raster tiles that allows me 
identify identically positioned tiles quickly - e.g.  the same purpose as 
'rid', but without the worry that the sequence could be out of order somewhere 
because of unknown gremlins in the serial generation process of 
postgres/raster2pgsql. 


9. So.. is there another way of detecting tile equivalence that is:

- 'short' e.g. visibly concise, like   A.rast && D.rast
- computationally fast
- robust, i.e. not likely to generate 1 missing tile among millions.


I suppose what I am getting towards is:

>  where D.rast ~~ A.rast and D.rast ~~ B.rast and D.rast ~~ C.rast

where ~~ is some computationally fast and trustworthy operator to pick out 
tiles in matching positions in equivalent rasters.



Graeme. 

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to