Re: [GENERAL] Fwd: Raster performance

2015-05-29 Thread David Haynes II
:

 On Thu, 28 May 2015 10:06:24 -0500
 David Haynes II dahay...@umn.edu wrote:
 
  The query run times are significantly slower on outdb as that using indb
  here are the run times on 2 queries.
 
  ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
  Counties)
  OutDB: 873.564s (14 minutes 33s) InDB:  127.36s (2 minutes 7s)
 
  ST_Count(select single band here)/ST_Clip(on all bands)/Inner
  Join/ST_Transform (US Counties)
  OutDB: 9537.371s (2 hours 38minutes)   InDB:  310s (5 minutes 10 seconds)
 
  In the query planner it shows a large change in the number of columns
  (width) that are picked up in the CTE_rast_select.
  These extra columns slow down the ability to process the data.

 As I mentioned before, that's not how that works. Those are estimates. If
 those rows are actually different between the two tables, then your data
 is different between the two databases and you can't expect the performance
 to be the same.

 Additionally, the part you snip out below isn't the part that's different.
 In particular, the difference is coming from the fact that one of the
 plans uses modis and the other uses modis_noout.

 Does modis exist in indb? Does modis_noout exist on outdb? What is the
 difference between these two tables? Because _that_ is where the time
 difference is most likely happening (based on the explain output).

 Additionally, run EXPLAIN ANALYZE on these queries to get the actual
 times in addition to the estimates. Furthermore, given that a lot of
 the confusion in this question is due to a lack of iformation, it would
 be a good idea to include the table definitions.

  OUT DB
  CTE rast_select
   -  Nested Loop  (cost=0.28..76131.41 rows=62033 *width=1086)*
 -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
 width=250)
 
 
  In DB
  Nested Loop  (cost=0.28..51767.41 rows=62033 *width=272*)
 -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
 width=250)
 
  On Wed, May 27, 2015 at 4:31 PM, PT wmo...@potentialtech.com wrote:
 
   On Tue, 26 May 2015 12:52:24 -0500
   David Haynes II dahay...@umn.edu wrote:
  
Hello,
   
I have a question about the query optimizer and its performance on
   spatial
datasets, specifically rasters. My use case is rather unique, the
application that I am developing allows users to request
 summarizations
   of
various geographic boundaries around the world. Therefore our raster
datasets are global. We are in the process of conducting some
 benchmarks
for our system and we noticed something unexpected.
   
The query is the same except the first is run on a raster (46gigs)
 in out
of database (outdb) and the second is the same raster (46gigs)
 stored in
database (indb). The raster is multibanded (13), with each band
representing one entire MODIS global scene. A single year of MODIS is
approximately 3.6 gigs.
   
The outdb is being out performed by indb, because the query optimizer
   gets
smarter. But what is also interesting is all the extra pieces that
 are
brought in with outdb.
   
with poly as
( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
us_counties )
, rast_select as
( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
   s.geom) )
select r.id, r.name, ST_Count(r.rast, 1, True)
   
   
   QUERY PLAN With Outdb
   
  
 --
Sort   (cost=93911.29..93926.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
   width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
 width=250)
  -  Index Scan using modis_rast_gist on modis r_1
 (cost=0.28..24.40 rows=2 width=836)
Index Cond: ((rast)::geometry  s.geom)
Filter: _st_intersects(s.geom, rast, NULL::integer)
  -  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
-  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
width=254)
   
QUERY PLAN With Indb
   
   
  
 -
Sort   (cost=69547.29..69562.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
   width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109
 width=250)
  -  Index Scan using modis_noout_rast_gist on modis_noout
 r_1
 (cost=0.28..16.56 rows=2 width=22

Re: [GENERAL] Fwd: Raster performance

2015-05-28 Thread David Haynes II
Sorry,

The query run times are significantly slower on outdb as that using indb
here are the run times on 2 queries.

ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
Counties)
OutDB: 873.564s (14 minutes 33s) InDB:  127.36s (2 minutes 7s)

ST_Count(select single band here)/ST_Clip(on all bands)/Inner
Join/ST_Transform (US Counties)
OutDB: 9537.371s (2 hours 38minutes)   InDB:  310s (5 minutes 10 seconds)

In the query planner it shows a large change in the number of columns
(width) that are picked up in the CTE_rast_select.
These extra columns slow down the ability to process the data.

OUT DB
CTE rast_select
 -  Nested Loop  (cost=0.28..76131.41 rows=62033 *width=1086)*
   -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)


In DB
Nested Loop  (cost=0.28..51767.41 rows=62033 *width=272*)
   -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)

On Wed, May 27, 2015 at 4:31 PM, PT wmo...@potentialtech.com wrote:

 On Tue, 26 May 2015 12:52:24 -0500
 David Haynes II dahay...@umn.edu wrote:

  Hello,
 
  I have a question about the query optimizer and its performance on
 spatial
  datasets, specifically rasters. My use case is rather unique, the
  application that I am developing allows users to request summarizations
 of
  various geographic boundaries around the world. Therefore our raster
  datasets are global. We are in the process of conducting some benchmarks
  for our system and we noticed something unexpected.
 
  The query is the same except the first is run on a raster (46gigs) in out
  of database (outdb) and the second is the same raster (46gigs) stored in
  database (indb). The raster is multibanded (13), with each band
  representing one entire MODIS global scene. A single year of MODIS is
  approximately 3.6 gigs.
 
  The outdb is being out performed by indb, because the query optimizer
 gets
  smarter. But what is also interesting is all the extra pieces that are
  brought in with outdb.
 
  with poly as
  ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
  us_counties )
  , rast_select as
  ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
  rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
 s.geom) )
  select r.id, r.name, ST_Count(r.rast, 1, True)
 
 
 QUERY PLAN With Outdb
 
 --
  Sort   (cost=93911.29..93926.80 rows=6204 width=254)
Sort Key: r.id, r.name
CTE poly
  -  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
 width=62247)
CTE rast_select
  -  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
-  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
-  Index Scan using modis_rast_gist on modis r_1
   (cost=0.28..24.40 rows=2 width=836)
  Index Cond: ((rast)::geometry  s.geom)
  Filter: _st_intersects(s.geom, rast, NULL::integer)
-  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
  -  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
  width=254)
 
  QUERY PLAN With Indb
 
 
 -
  Sort   (cost=69547.29..69562.80 rows=6204 width=254)
Sort Key: r.id, r.name
CTE poly
  -  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109
 width=62247)
CTE rast_select
  -  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
-  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
-  Index Scan using modis_noout_rast_gist on modis_noout r_1
   (cost=0.28..16.56 rows=2 width=22)
  Index Cond: ((rast)::geometry  s.geom)
  Filter: _st_intersects(s.geom, rast, NULL::integer)
-  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
  -  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
  width=254)

 I could be missing something here, but I don't see how long the queries
 actually take to run. Have you actually run the queries and timed them?
 Keep in mind that analyze does not actually run the query, it only plans
 it, so the actual run time is unknown if all you do is analyze.

 The query plans appear to be equal, assuming there are slight variances
 in the names of tables from one DB to another (and I assume that your
 description of indb and outdb reflects the fact that there are (for
 reasons unknown) two copies of the data).

 The only purpose to those estimates is to choose a good plan. If the
 plan is bad for one database and both databases have the same data, then
 the plan will be bad for both.

 Since there have been no other responses, I'm guessing that others are
 confused by your question as well. Can you describe the actual problem
 that you're seeing?

 --
 Bill Moran wmo

[GENERAL] Fwd: Raster performance

2015-05-26 Thread David Haynes II
Hello,

I have a question about the query optimizer and its performance on spatial
datasets, specifically rasters. My use case is rather unique, the
application that I am developing allows users to request summarizations of
various geographic boundaries around the world. Therefore our raster
datasets are global. We are in the process of conducting some benchmarks
for our system and we noticed something unexpected.

The query is the same except the first is run on a raster (46gigs) in out
of database (outdb) and the second is the same raster (46gigs) stored in
database (indb). The raster is multibanded (13), with each band
representing one entire MODIS global scene. A single year of MODIS is
approximately 3.6 gigs.

The outdb is being out performed by indb, because the query optimizer gets
smarter. But what is also interesting is all the extra pieces that are
brought in with outdb.

with poly as
( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
us_counties )
, rast_select as
( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, s.geom) )
select r.id, r.name, ST_Count(r.rast, 1, True)


   QUERY PLAN With Outdb
--
Sort   (cost=93911.29..93926.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
  -  Index Scan using modis_rast_gist on modis r_1
 (cost=0.28..24.40 rows=2 width=836)
Index Cond: ((rast)::geometry  s.geom)
Filter: _st_intersects(s.geom, rast, NULL::integer)
  -  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
-  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
width=254)

QUERY PLAN With Indb

-
Sort   (cost=69547.29..69562.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
  -  Index Scan using modis_noout_rast_gist on modis_noout r_1
 (cost=0.28..16.56 rows=2 width=22)
Index Cond: ((rast)::geometry  s.geom)
Filter: _st_intersects(s.geom, rast, NULL::integer)
  -  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
-  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
width=254)

-- 
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org



-- 
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org