Hello everyone,
Short version:
I have some identically specified/tiled rasters that differ only in terms of
the data values in the pixels and table name*.
I want to find a quick, easy and 100% trustworthy test of tile position
equivalence between the tiles of several raster tables, so that I can match up
the pixels in corresponding tiles for mathematical operations.
Is there a quick and easy way to do this?
* in theory.
Painfully long version:
I have some big rasters, around 18 to 72 giga-pixels each. The rasters have a
single band containing 8-bit values. They are specified identically in terms of
tiling, georeferencing, skew, scale, pixel size etc. The only difference is in
the contents of the pixels.
I've been processing these with GDAL and Python I'm getting very nice speeds,
e.g. <60 minutes to perform calculations using several rasters together.
I've imported my rasters (A,B,C,D) into PG raster using 100x100 tiles, and I'm
trying some analysis in postgis. This is deliberate; I could do the analysis
in gdalcalc very easily, but I want to make it possible for database users to
do this kind of work.
I hit a stumbling block when it comes to carrying out aligned calculations
between pixels. I'm basically carrying out an operation over every pixel in D
and comparing it with the corresponding pixels in A, B, and C.
Logically, what I want to say is:
- for each 'rast' tile in raster table D
- for each pixel of that tile (x=1..100, y=1..100)
- get the corresponding pixel value from the corresponding tile in rasters
A,B and C and do some work
The problem arises around the word 'corresponding' and the tradeoff between
accuracy, speed and query elegance/debuggability.
To my mind, if the rasters are specified identically and the import routines
are stable (e.g. always producing the same metadata for the same raster tiles),
then it should be possible to pick out the identical tiles between rasters.
Possible approaches:
1. Bounding box intersection. If you try bounding box intersection to identify
corresponding tiles, it's suitably fast:
> where D.rast && A.rast and D.rast && B.rast and D.rast && C.rast
Unfortunately, you find that the bounding boxes of the neighbouring tiles in
all directions (8-neighbours) are also matched for && intersection.
The consquence is that you get 256 rows where you want to have 1. I've checked
this is what is happening by looking at the returned rids. I expect this is
happening because of the BBox of a tile is slightly larger than the tile, by
definition.
2. Make a multi-band raster. One solution would be to merge the rasters
outside postgis into a multi-band aligned raster and import that (or do the
same inside postgis raster using e.g. addband).
I don't want to do that because a) the data is big, copies are bad and b) the
data is logically distinct, so I want to store and address it using logically
distinct raster tables.
3. Address raster pixels indirectly using geography. I could iterate across
the rasters using geographical points on the landscape that I've calculated to
correspond to the center of pixels on the rasters. However, for this problem,
please assume that I want to index the polygon tiles/pixels *directly* between
corresponding tiles, rather than *indirectly*. I suspect it would be rather
slow to do things this way, too.
4. Identify exactly identical bounding boxes. Checking for equality between
st_envelopes of tiles works:
> where st_envelope(D.rast) = st_envelope(C.rast) and st_envelope(D.rast) =
> st_envelope(B.rast) and st_envelope(D.rast) = st_envelope(A.rast)
but it is very slow - many many times slower than the && method. Also, I expect
the st_envelope generation algorithm is stable and will always make exactly the
same output for a given tile across each raster, but I can't be 100% certain
without taking a close look at the source code of postgis.
5. I can use matching rids (since the raster tiles get numbered identically in
practice from identically specified source tiles),
> where D.rid=A.rid and D.rid=B.rid and D.rid=C.rid
This is ideal - super fast, quick to write - but it seems there might be a
theoretical risk that the rids might not be numbered in the order I expect when
imported through raster2pgsql. How real is that risk?
6. I could compare ST_metadata equality between tiles. This also has the
advantage that if the tiles aren't set up identically, it will fail horribly.
It seems there is a lot of data being compared (skew etc) which I know for
certain matches between the tiles.
> where st_metadata(D.rast) = st_metadata(C.rast) and st_metadata(D.rast) =
> st_metadata(B.rast) and st_metadata(D.rast) = st_metadata(A.rast)
This is very slow.
7. Knowing the tiles are aligned and equivalent, I could compare only the
distinct components of the tile metadata, e.g. the topleft pixels
(ST_Raster2WorldCoordX, ST_Raster2WorldCoordY).
> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(C.rast,1) and
> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(B.rast,1) and
> st_raster2worldcoordx(D.rast,1) = st_raster2worldcoordx(A.rast,1) and
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(C.rast,1) and
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(B.rast,1) and
st_raster2worldcoordy(D.rast,1) = st_raster2worldcoordy(A.rast,1)
This is fast and close to ideal, but it involves quite a bit of text compared
to the cute && syntax. I don't mind typing, but it's harder to spot errors.
8. I could precalculate a key column over my raster tiles that allows me
identify identically positioned tiles quickly - e.g. the same purpose as
'rid', but without the worry that the sequence could be out of order somewhere
because of unknown gremlins in the serial generation process of
postgres/raster2pgsql.
9. So.. is there another way of detecting tile equivalence that is:
- 'short' e.g. visibly concise, like A.rast && D.rast
- computationally fast
- robust, i.e. not likely to generate 1 missing tile among millions.
I suppose what I am getting towards is:
> where D.rast ~~ A.rast and D.rast ~~ B.rast and D.rast ~~ C.rast
where ~~ is some computationally fast and trustworthy operator to pick out
tiles in matching positions in equivalent rasters.
Graeme.
_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users