Re: [postgis-users] Help with SQL query?

Darrel Maddy Fri, 27 Nov 2015 09:43:33 -0800

Dear Regina,

Many thanks for this suggestion.


I ran the query in this form and for one raster it takes 3372s (~55 mins).  I 
guess that is what I had anticipated from the single tile  exercise I ran with 
the alternate algorithm.

This still seems a little too long however and so I have started to explore 
ways to improve upon this by pre-processing some of the data.  The cells of 
interest represent the ‘main channels’ in a drainage network.  Although in 
future this network may change position from one output timestep to another 
(they are actually 1000l iterations of the model apart) in this particular 
variant the position of these cells is static.  With that in mind I decided to 
create a binary raster where the 1’s represent the channel cells (I had to do 
this in arcgis as QGIS does not appear to have a Con function in the raster 
calculator!)  and then exported this as a point layer. I then deleted the 
points coded zero and saved the shp file in QGIS.  I imported the ntwork shp 
file into postgis (there are about 15500 15 points) and I am now running:

CREATE TABLE mymodel.networkdep AS
SELECT filename, gid, ST_Value(rast, geom) val
FROM mymodel.deposition, mymodel.network
WHERE ST_Intersects(rast, geom)
ORDER BY gid, rid;

I will sum by raster (i.e. filename) using the new table but will settle for 
just having the relevant data for now. This took  2057s(~35 minutes!) to 
complete!

If this is the best way to do it I will explore the OGR library and try and 
hardcode the network point output directly into the model or, more 
realistically, write a short routine to extract this automatically from the 
flow accumulation output rasters without recourse to a GIS.

I am learning a lot through this exercise, so thanks once again to all of you 
who have made suggestions, they are very much appreciated.

Best wishes

Darrel








From: postgis-users [mailto:[email protected]] On Behalf Of 
Paragon Corporation
Sent: 26 November 2015 05:08
To: 'PostGIS Users Discussion' <[email protected]>
Subject: Re: [postgis-users] Help with SQL query?

That timing seems much slower than I recall.

FWIW expression based mapalgebra as I recall is slower than using the call back 
function approach.  So you could try wrapping your CASE in a call back function.

However I think something else might be going on here and postgres might be 
repeating work.  I forgot under what conditions it decides to reevaluate a 
function call, I just remember being really surprised by it.

To avoid that, you can try using a CTE, also you don't need that ST_Union call 
which for larger number of rasters is expensive, and you might even generate a 
raster that is too big to compute.

I'm also guessing your rasts are all tiled the same, so you really don't need 
ST_Intersects, just use the same box operator

So try this:

WITH  foo AS (
  SELECT ST_SummaryStats( ST_MapAlgebra(deposition.rast, concentrated.rast, 
'CASE WHEN [rast2] > 0.0 THEN [rast1] ELSE NULL END' ) ) As st
            FROM mymodel.deposition INNER JOIN mymodel.concentrated ON ( 
deposition.rast  ~=  concentrated.rast )
            WHERE deposition.rid=1

)
SELECT SUM( (st).sum )
FROM foo;


Hope that helps,
Regina
http://www.postgis.us
http://postgis.net


From: postgis-users [mailto:[email protected]] On Behalf Of 
Darrel Maddy
Sent: Wednesday, November 25, 2015 5:06 PM
To: PostGIS Users Discussion 
<[email protected]<mailto:[email protected]>>; Brent 
Wood <[email protected]<mailto:[email protected]>>
Subject: Re: [postgis-users] Help with SQL query?

Dear Brent,

I must confess that my attempts to do this are so far proving very unsuccessful

If  I run the following query:

SELECT  (ST_SummaryStats(ST_Union(rast))).sum AS sum
FROM  (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN 
[rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast
            FROM mymodel.deposition, mymodel.concentrated
            WHERE ST_Intersects(deposition.rast, concentrated.rast) AND 
deposition.rid=1 ) foo ;

It takes around 30 seconds to complete as I assume it is only looking at one 
tile(they are 256x256 pixels) i.e. rid 1. It is not easy to check the sum – for 
that I need one complete raster.

For the record this was marginally faster than
SELECT (ST_SummaryStats(ST_Union(rast))).sum AS sum
FROM (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN 
[rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast
            FROM mymodel.deposition, mymodel.concentrated
            WHERE mymodel.deposition.filename='10_depo.tif' AND 
ST_UpperleftX(mymodel.deposition.rast) = 
ST_UpperleftX(mymodel.concentrated.rast) AND
                         ST_UpperleftY(mymodel.deposition.rast) = 
ST_UpperleftY(mymodel.deposition.rast) ) foo ;
Even after I built indexes for the clauses after the WHERE.

Now there are 144 tiles in each of the rasters I want to perform this operation 
on.  Logic would therefore suggest this should take ~4500s

However when I perform the following query

SELECT  (ST_SummaryStats(ST_Union(rast))).sum AS sum
FROM  (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN 
[rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast
            FROM mymodel.deposition, mymodel.concentrated
            WHERE ST_Intersects(deposition.rast, concentrated.rast) AND 
deposition.filename='10_depo.tif' ) foo ;

The query is still running after 18000s!  I must therefore assume I have done 
something wrong but as you may have guessed the answer eludes me.

Any further suggestions would be welcome but I will continue to try and find a 
solution as I have 135 rasters to perform this operations on now and 
potentially many thousands more in the future.

Darrel

.

I





From: postgis-users [mailto:[email protected]] On Behalf Of 
Darrel Maddy
Sent: 24 November 2015 19:52
To: Brent Wood <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [postgis-users] Help with SQL query?

Dear Brent,

Many thanks. The data are tiled (256x256) hence the large number of rows from 
the original 135 tifs. I did not build any indexes however, so I will do some 
reading and see how best to approach that (the threads you listed look useful 
so thanks for that).

I will run some additional mini queries limited to just one comparison and 
check using QGIS as you suggest – I probably should have done that first!

My workstation has 64GB Ram and I would be surprised if it was significantly 
caching to disk. I also have a hexacore intel extreme processor so I would not 
expect this to be hardware limited. I must confess I expected it to finish 
within a couple of hours.

Anyhow very many thanks. I will continue to explore and report back hopefully 
with positive news.

Darrel


From: Brent Wood [mailto:[email protected]]
Sent: 24 November, 2015 7:36 PM
To: Darrel Maddy 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>
Subject: Re: [postgis-users] Help with SQL query?

Indexing can improve performance by 100s of x, without them things can be slow. 
Also, did you tile the images when you imported them? If not, then each 
iteration is working through all the pixels in the image, rather than a small 
subset. Essentially with tiles, you have a deep (long) table rather than a wide 
one. RDBMSs work better with lots of small records than a few wide ones, 
especially when indexes are used.

This might help:
http://gis.stackexchange.com/questions/43053/how-to-speed-up-queries-for-raster-databases

and see the raster tutorial they mention for the SRTM data, as to how that is 
loaded into Postgis:
https://trac.osgeo.org/postgis/wiki/WKTRasterTutorial01

To test the logic (the syntax is correct or it wouldn't be working) you could 
add to the "where" clause an extra filter so that only a small subset of the 
entire dataset is included (like just one QGIS operation) then compare this 
with the QGIS result.

That would be much faster that testing on the entire dataset. Once you know it 
is correct for the test case(s), then you can run it on the complete set.

Note that some queries can build up large in-memory objects, so make sure your 
system is not swapping to disk, as that will also slow things down (hugely).

Cheers

Brent

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Help with SQL query?

Reply via email to