RE: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Lucena, Ivan
Thanks Jason,

Those results that I reported attest the performance of GDAL tools. You could 
get completely different results if you use tools from other vendors, like XCI, 
ACXG1S and FM3 [1]. There are some options on how that could be implemented but 
I believe we did some good choices in GDAL. 

Best regards,

Ivan

[1] - These names were mixed on purpose ;)

>  ---Original Message---
>  From: Jason Birch <[EMAIL PROTECTED]>
>  Subject: RE: [OSGeo-Discuss] Raster data on RDBMS
>  Sent: Oct 29 '08 15:09
>  
>  I find this stuff fascinating, but I believe that the Oracle EULA prohibits 
> users from disclosing the results of benchmark tests.  Be careful how you 
> represent these results.
>  
>  Jason
>  
>  -Original Message-
>  From: Lucena, Ivan
>  Subject: [OSGeo-Discuss] Raster data on RDBMS
>  
>  I would like to return to a discussion that we had months ago about raster 
> on RDBMS. But this time I would like to present some number.
>  
>  
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


RE: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Sylvan Ascent Inc.
I understand. It is the difference between having to (1)serve fixed size tiles 
really fast out to GEarth/OpenLayers/etc, or (2) having to provide a random 
image or derivation thereof in any size, format, projection(WCS?). In the 
second case, speeding up the backend is the only way, in the first case, a tile 
cache could be a better choice. 
 
There is another possibility, that of putting a self-building cache on the 
backend, between the raw datastore and the WCS server, it could even be a 
database with some PGChip scheme.
 
<>
 
It may be faster from the db, as you can leverage the spatial index, and only 
pull the tiles you need, but I think Frank can answer this better.
 
I'm currently most interested in the first case of serving tiles, so if anyone 
has ideas on best practice for this, that would be great.
 
Roger Bedell, President Sylvan Ascent Inc.



From: [EMAIL PROTECTED] on behalf of G. Allegri
Sent: Wed 10/29/2008 12:50 PM
To: OSGeo Discussions
Subject: Re: [OSGeo-Discuss] Raster data on RDBMS



Thanks everyone for this interesting thread. I think the two
approaches have different goals:

 - rendering-on-demand performance comparison
 - raster serving performance comparison

Both are of interest, but they shouldn't be confused.
It might be helpful to write a wiki page (or something else) where to
gather the "best-practices" on serving (big) rasters. Well, it could
be interesting for vectors too, but it's a different story.
It's a common task for many of us, and it would be of help for both
the newbies and the more experienced users...
Ok, sorry for this OT digression :)

2008/10/29 Frank Warmerdam <[EMAIL PROTECTED]>:
> Sylvan Ascent Inc. wrote:
>>
>> I think, that since the goal of all this storage of pyramids and the like
>> is
>> just to get speed, that they aren't apples/oranges, but apples apples,
>> since
>> they are both pyramid schemes, just in different places, either in front,
>> or
>> in back of the server.
>
> Roger,
>
> My point is that a tile caching approach is really comparing tile caching
> performance to rendering-on-demand performance while I think the original
> point was that rendering-from-database and rendering-from-filesystem could
> have similar performance for input raster data.
>
> Your comparison is also of interest but I don't think it is fair to compare
> rendering from Oracle through MapServer (or GeoServer) to satisfying
> map requests directly from a tile cache.
>
> Best regards,
> --
> ---+--
> I set the clouds in motion - turn up   | Frank Warmerdam,
> [EMAIL PROTECTED]
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush| Geospatial Programmer for Rent
>
> ___
> Discuss mailing list
> Discuss@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


<>___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Lucena, Ivan
Frank,

>  My point is that a tile caching approach is really comparing tile caching
>  performance to rendering-on-demand performance while I think the original
>  point was that rendering-from-database and rendering-from-filesystem could
>  have similar performance for input raster data.

D'accord.

>  Your comparison is also of interest but I don't think it is fair to compare
>  rendering from Oracle through MapServer (or GeoServer) to satisfying
>  map requests directly from a tile cache.

I shouldn't have mentioned MapServer. Web-serving wasn't my point to beginning 
with. I can gdal_translate from 
the georaster driver to vrt and open the result on OpenEV. That gives me a good 
idea of how the driver perform in 
a rendering environment. I would need to change OpenEV do it directly. But by 
doing that I would be testing GDAL 
not Oracle. To see Oracle Georaster in action I can use some freeviewer (free 
as in gratis). That is not my point.

The real point is should we discard RDBMS for Raster storage just because we 
are sure that there will be overhead, 
ours direct fopen(), fread() will always be faster? Myth or fact? Those tests 
have proven otherwise so the question 
is what is going own?

I messed around with some free-open-source RDBMS a long time ago (last century 
actually), checking out how to 
create type extension. But I would not imagine getting into to the core of how 
does things work just for the fun of 
it. So, the only thing I can do is to check the results from the outside and 
Oracle+GDAL/GeoRaster is the 
environment that let me do that because they let you use the software without a 
license, as long it is not on 
production mode. 

I should test mySQL+TerraLib or PostGIS+GDAL/PGCHIP also. Maybe.

Best regards,

Ivan

___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread G. Allegri
Thanks everyone for this interesting thread. I think the two
approaches have different goals:

 - rendering-on-demand performance comparison
 - raster serving performance comparison

Both are of interest, but they shouldn't be confused.
It might be helpful to write a wiki page (or something else) where to
gather the "best-practices" on serving (big) rasters. Well, it could
be interesting for vectors too, but it's a different story.
It's a common task for many of us, and it would be of help for both
the newbies and the more experienced users...
Ok, sorry for this OT digression :)

2008/10/29 Frank Warmerdam <[EMAIL PROTECTED]>:
> Sylvan Ascent Inc. wrote:
>>
>> I think, that since the goal of all this storage of pyramids and the like
>> is
>> just to get speed, that they aren't apples/oranges, but apples apples,
>> since
>> they are both pyramid schemes, just in different places, either in front,
>> or
>> in back of the server.
>
> Roger,
>
> My point is that a tile caching approach is really comparing tile caching
> performance to rendering-on-demand performance while I think the original
> point was that rendering-from-database and rendering-from-filesystem could
> have similar performance for input raster data.
>
> Your comparison is also of interest but I don't think it is fair to compare
> rendering from Oracle through MapServer (or GeoServer) to satisfying
> map requests directly from a tile cache.
>
> Best regards,
> --
> ---+--
> I set the clouds in motion - turn up   | Frank Warmerdam,
> [EMAIL PROTECTED]
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush| Geospatial Programmer for Rent
>
> ___
> Discuss mailing list
> Discuss@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Frank Warmerdam

Sylvan Ascent Inc. wrote:

I think, that since the goal of all this storage of pyramids and the like is
just to get speed, that they aren't apples/oranges, but apples apples, since
they are both pyramid schemes, just in different places, either in front, or
in back of the server.


Roger,

My point is that a tile caching approach is really comparing tile caching
performance to rendering-on-demand performance while I think the original
point was that rendering-from-database and rendering-from-filesystem could
have similar performance for input raster data.

Your comparison is also of interest but I don't think it is fair to compare
rendering from Oracle through MapServer (or GeoServer) to satisfying
map requests directly from a tile cache.

Best regards,
--
---+--
I set the clouds in motion - turn up   | Frank Warmerdam, [EMAIL PROTECTED]
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Programmer for Rent

___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


RE: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Sylvan Ascent Inc.
Hi Frank,
I'm not sure I completely agree about your apples/oranges, here's why:
 
We are in the process of putting together a big raster server, so I'm 
evaluating the best way to proceed. I'm quite familiar with putting raster 
tiles in a database, did that way back in the last century. I've decided to use 
Geoserver, since we need (want)  WFS-T and WCS, and the latest speed race 
between MapServer and GeoServer seems to be a tie more or less.
 
As I see it we have several options:
 
1) Put the big original raster images in back of Geoserver, access using their 
Mosaic and the GDAL based ImageIO-ext.
Advantage - easy, but kind of slow.
 
2) Take the originals and build a file-based pyramid
Advantage - faster, but a lot of work, plus duplication and tricky to keep 
updated as new data comes in.
 
3) Take the originals and build a PostGIS based pyramid.
Likely, about the same as 3 in speed and work and duplication.
 
4) Do 1, but put a pyramiding tileserver in front. It builds the pyrimid in 2 
and 3 over time, and is likely the fastest if you hit the cache, and is no 
harder to do than #1.
 
I think, that since the goal of all this storage of pyramids and the like is 
just to get speed, that they aren't apples/oranges, but apples apples, since 
they are both pyramid schemes, just in different places, either in front, or in 
back of the server. 
 
Roger Bedell, President Sylvan Ascent Inc.
800-362-8971
+34 626 855 662
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 
www.sylvanascent.com <http://www.sylvanascent.com/> 
www.topodepot.com <http://www.topodepot.com/> 



From: [EMAIL PROTECTED] on behalf of Frank Warmerdam
Sent: Wed 10/29/2008 10:43 AM
To: OSGeo Discussions
Subject: Re: [OSGeo-Discuss] Raster data on RDBMS



Sylvan Ascent Inc. wrote:
> Mike and Ivan,
>
> I'd like to see them also compared to a caching solution, like GeoWebCache,
> or TileCache. These effectively create a file-based "database" of little
> bitty tiles at certain resolutions, kind of like a tile pyrimid that is
> created gradually over time as the image data is accessed.
>
> One would think the file-based cache system would be faster than a similar
> database solution, with the database solution giving no real benefits that I
> can see.

Roger,

While it might be educational to compare to a tilecache solution it is really
comparing apples and oranges.

Best regards,
--
---+--
I set the clouds in motion - turn up   | Frank Warmerdam, [EMAIL PROTECTED]
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Programmer for Rent

___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


<>___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


RE: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Jason Birch
I find this stuff fascinating, but I believe that the Oracle EULA prohibits 
users from disclosing the results of benchmark tests.  Be careful how you 
represent these results.

Jason

-Original Message-
From: Lucena, Ivan
Subject: [OSGeo-Discuss] Raster data on RDBMS

I would like to return to a discussion that we had months ago about raster on 
RDBMS. But this time I would like to present some number.

___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Frank Warmerdam

Sylvan Ascent Inc. wrote:

Mike and Ivan,

I'd like to see them also compared to a caching solution, like GeoWebCache,
or TileCache. These effectively create a file-based "database" of little
bitty tiles at certain resolutions, kind of like a tile pyrimid that is
created gradually over time as the image data is accessed.

One would think the file-based cache system would be faster than a similar
database solution, with the database solution giving no real benefits that I
can see.


Roger,

While it might be educational to compare to a tilecache solution it is really
comparing apples and oranges.

Best regards,
--
---+--
I set the clouds in motion - turn up   | Frank Warmerdam, [EMAIL PROTECTED]
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Programmer for Rent

___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


RE: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Sylvan Ascent Inc .
Mike and Ivan,
 
I'd like to see them also compared to a caching solution, like GeoWebCache, or 
TileCache. These effectively create a file-based "database" of little bitty 
tiles at certain resolutions, kind of like a tile pyrimid that is created 
gradually over time as the image data is accessed.
 
One would think the file-based cache system would be faster than a similar 
database solution, with the database solution giving no real benefits that I 
can see.
 
>>>I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
 
Since the cache is maintained by the software in a completely defined way, and 
never messed with by humans, I wonder what could go wrong?
 
Roger Bedell, Sylvan Ascent Inc.
800-362-8971
+34 626 855 662
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> 



From: [EMAIL PROTECTED] on behalf of Smith, Michael ERDC-CRREL-NH
Sent: Wed 10/29/2008 10:13 AM
To: Lucena, Ivan; OSGeo Discussions; Paul Ramsey
Subject: Re: [OSGeo-Discuss] Raster data on RDBMS



Ivan,

Those numbers look impressive. We are just starting to set up some new
hardware here and I plan to do some testing also. Perhaps we can collaborate
and come up with a test suite in order to track these numbers across builds.

Mike


--
Michael Smith
RSGIS Center
ERDC - CRREL
US Army Corps of Engineers




On 10/29/08  1:35 AM, "Lucena, Ivan" <[EMAIL PROTECTED]> wrote:

> Paul,
>
> Good thought.
>
> Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1).
> That is good because GeoTiffs doesn't tile on band space. So I would imagine
> that if I tiled the GeoTiff this way:
>
> [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF
> Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m42.991s
> user 0m20.289s
> sys   0m2.516s
>
> The comparison would be fair:
>
> [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF
> out2.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m1.604s
> user 0m1.156s
> sys  0m0.444s
>
> What do you think?
>
> I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one
> application could take advantage of it by telling Oracle to cache the BLOB in
> memory. So the next time a user zoom-in the performance would be even better.
>
> I am trying to setup a mapserver experiment on that issue but for now I would
> like to keep my analysis on that very simple process of extracting a subset.
>
> Best regards,
>
> Ivan
>
>
>>  ---Original Message---
>>  From: Paul Ramsey <[EMAIL PROTECTED]>
>>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>>  Sent: Oct 29 '08 05:00
>> 
>>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>>  as well you aren't really doing a direct comparison. Even if you end
>>  up with the same numbers for both processes, I'll still be impressed,
>>  since I assumed Oracle would have a higher overhead.
>> 
>>  P.
>> 
>>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]>
>> wrote:
>>> Hi There,
>>>
>>> I would like to return to a discussion that we had months ago about raster
>>> on RDBMS. But this time I would like to present some number.
>>>
>>> As long as I could recall there was basically two major arguments contrary
>>> to storing raster on RDBMS. One very pragmatical: "Why waste precious
>>> process time with the overhead of dealing with queries, tables, client-sever
>>> back and forth just to get the data from BLOB fields on a database when you
>>> can get it directly from the file system?". The other argument was
>>> semantical: "Why store raster on RDBMS if in general we are not expecting to
>>> have a transactions on that data?"
>>>
>>> I cannot argue against the second one. I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
>>>
>>> That is my experiment. I downloaded two free data samples from Naveteq
>>> website. Two geotiff files with the same size and number of bands (14336,
>>> 14336,  3):
>>>
>>> 

Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-29 Thread Smith, Michael ERDC-CRREL-NH
Ivan,

Those numbers look impressive. We are just starting to set up some new
hardware here and I plan to do some testing also. Perhaps we can collaborate
and come up with a test suite in order to track these numbers across builds.

Mike


-- 
Michael Smith
RSGIS Center
ERDC - CRREL
US Army Corps of Engineers




On 10/29/08  1:35 AM, "Lucena, Ivan" <[EMAIL PROTECTED]> wrote:

> Paul,
> 
> Good thought. 
> 
> Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1).
> That is good because GeoTiffs doesn't tile on band space. So I would imagine
> that if I tiled the GeoTiff this way:
> 
> [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF
> Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m42.991s
> user 0m20.289s
> sys   0m2.516s
> 
> The comparison would be fair:
> 
> [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF
> out2.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real   0m1.604s
> user 0m1.156s
> sys  0m0.444s
> 
> What do you think?
> 
> I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one
> application could take advantage of it by telling Oracle to cache the BLOB in
> memory. So the next time a user zoom-in the performance would be even better.
> 
> I am trying to setup a mapserver experiment on that issue but for now I would
> like to keep my analysis on that very simple process of extracting a subset.
> 
> Best regards,
> 
> Ivan
> 
> 
>>  ---Original Message---
>>  From: Paul Ramsey <[EMAIL PROTECTED]>
>>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>>  Sent: Oct 29 '08 05:00
>>  
>>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>>  as well you aren't really doing a direct comparison. Even if you end
>>  up with the same numbers for both processes, I'll still be impressed,
>>  since I assumed Oracle would have a higher overhead.
>>  
>>  P.
>>  
>>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]>
>> wrote:
>>> Hi There,
>>> 
>>> I would like to return to a discussion that we had months ago about raster
>>> on RDBMS. But this time I would like to present some number.
>>> 
>>> As long as I could recall there was basically two major arguments contrary
>>> to storing raster on RDBMS. One very pragmatical: "Why waste precious
>>> process time with the overhead of dealing with queries, tables, client-sever
>>> back and forth just to get the data from BLOB fields on a database when you
>>> can get it directly from the file system?". The other argument was
>>> semantical: "Why store raster on RDBMS if in general we are not expecting to
>>> have a transactions on that data?"
>>> 
>>> I cannot argue against the second one. I basically agreed with that but
>>> after seeing how fragile and complicated even a well defined structure of
>>> folders and files could be I would vote in favor of the good and old
>>> relational model.
>>> 
>>> That is my experiment. I downloaded two free data samples from Naveteq
>>> website. Two geotiff files with the same size and number of bands (14336,
>>> 14336,  3):
>>> 
>>> [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF
>>> 602828  Barcelona_2007_R2C2.TIF
>>> [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF
>>> 602828  San_Francisco_2006_R1C2.TIF
>>> 
>>> Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The
>>> loading process is comparable than some commercial ETL products on the
>>> market. It took about 2 minutes to load each image.
>>> 
>>> [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster
>>> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
>>> Input file size is 14336, 14336
>>> 0...10...20...30...40...50...60...70...80...90...100 - done.
>>> Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
>>> real  1m54.973s
>>> user 0m4.368s
>>> sys   0m1.936s
>>> 
>>> If you are a Oracle GeoRaster users you might be excited about those number
>>> already but those are not the numbers I want to show. What I would like to
>>> do is to compare the time that it takes to extract subse

Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-28 Thread Lucena, Ivan
Paul,

Good thought. 

Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1). 
That is good because GeoTiffs doesn't tile on band space. So I would imagine 
that if I tiled the GeoTiff this way:

[EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF 
Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real  0m42.991s
user 0m20.289s
sys   0m2.516s

The comparison would be fair:

[EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF 
out2.tif -srcwin 0 0 2000 2000
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real  0m1.604s
user 0m1.156s
sys  0m0.444s

What do you think?

I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one 
application could take advantage of it by telling Oracle to cache the BLOB in 
memory. So the next time a user zoom-in the performance would be even better. 

I am trying to setup a mapserver experiment on that issue but for now I would 
like to keep my analysis on that very simple process of extracting a subset. 

Best regards,

Ivan


>  ---Original Message---
>  From: Paul Ramsey <[EMAIL PROTECTED]>
>  Subject: Re: [OSGeo-Discuss] Raster data on RDBMS
>  Sent: Oct 29 '08 05:00
>  
>  The data is chunked in Oracle into tiles, so unless you tile the TIFF
>  as well you aren't really doing a direct comparison. Even if you end
>  up with the same numbers for both processes, I'll still be impressed,
>  since I assumed Oracle would have a higher overhead.
>  
>  P.
>  
>  On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> wrote:
>  > Hi There,
>  >
>  > I would like to return to a discussion that we had months ago about raster 
> on RDBMS. But this time I would like to present some number.
>  >
>  > As long as I could recall there was basically two major arguments contrary 
> to storing raster on RDBMS. One very pragmatical: "Why waste precious process 
> time with the overhead of dealing with queries, tables, client-sever back and 
> forth just to get the data from BLOB fields on a database when you can get it 
> directly from the file system?". The other argument was semantical: "Why 
> store raster on RDBMS if in general we are not expecting to have a 
> transactions on that data?"
>  >
>  > I cannot argue against the second one. I basically agreed with that but 
> after seeing how fragile and complicated even a well defined structure of 
> folders and files could be I would vote in favor of the good and old 
> relational model.
>  >
>  > That is my experiment. I downloaded two free data samples from Naveteq 
> website. Two geotiff files with the same size and number of bands (14336, 
> 14336,  3):
>  >
>  > [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF
>  > 602828  Barcelona_2007_R2C2.TIF
>  > [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF
>  > 602828  San_Francisco_2006_R1C2.TIF
>  >
>  > Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The 
> loading process is comparable than some commercial ETL products on the 
> market. It took about 2 minutes to load each image.
>  >
>  > [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster 
> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
>  > real  1m54.973s
>  > user 0m4.368s
>  > sys   0m1.936s
>  >
>  > If you are a Oracle GeoRaster users you might be excited about those 
> number already but those are not the numbers I want to show. What I would 
> like to do is to compare the time that it takes to extract subset from the 
> original geotiff and compare with the time to extract the same subset from 
> the RDBMS. He are the numbers:
>  >
>  > [EMAIL PROTECTED]:~/Data> time gdal_translate 
> georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > real  0m0.720s
>  > user 0m0.408s
>  > sys   0m0.108s
>  >
>  > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF 
> out2.tif -srcwin 0 0 2000 2000
>  > Input file size is 14336, 14336
>  > 0...10...20...30...40...50...60...70...80...90...100 - done.
>  > real  0m1.177s
>  > user 0m0.976s
>  > sys   0m0.188s
>  >

Re: [OSGeo-Discuss] Raster data on RDBMS

2008-10-28 Thread Paul Ramsey
The data is chunked in Oracle into tiles, so unless you tile the TIFF
as well you aren't really doing a direct comparison. Even if you end
up with the same numbers for both processes, I'll still be impressed,
since I assumed Oracle would have a higher overhead.

P.

On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> wrote:
> Hi There,
>
> I would like to return to a discussion that we had months ago about raster on 
> RDBMS. But this time I would like to present some number.
>
> As long as I could recall there was basically two major arguments contrary to 
> storing raster on RDBMS. One very pragmatical: "Why waste precious process 
> time with the overhead of dealing with queries, tables, client-sever back and 
> forth just to get the data from BLOB fields on a database when you can get it 
> directly from the file system?". The other argument was semantical: "Why 
> store raster on RDBMS if in general we are not expecting to have a 
> transactions on that data?"
>
> I cannot argue against the second one. I basically agreed with that but after 
> seeing how fragile and complicated even a well defined structure of folders 
> and files could be I would vote in favor of the good and old relational model.
>
> That is my experiment. I downloaded two free data samples from Naveteq 
> website. Two geotiff files with the same size and number of bands (14336, 
> 14336,  3):
>
> [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF
> 602828  Barcelona_2007_R2C2.TIF
> [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF
> 602828  San_Francisco_2006_R1C2.TIF
>
> Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The 
> loading process is comparable than some commercial ETL products on the 
> market. It took about 2 minutes to load each image.
>
> [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster 
> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
> real  1m54.973s
> user 0m4.368s
> sys   0m1.936s
>
> If you are a Oracle GeoRaster users you might be excited about those number 
> already but those are not the numbers I want to show. What I would like to do 
> is to compare the time that it takes to extract subset from the original 
> geotiff and compare with the time to extract the same subset from the RDBMS. 
> He are the numbers:
>
> [EMAIL PROTECTED]:~/Data> time gdal_translate 
> georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real  0m0.720s
> user 0m0.408s
> sys   0m0.108s
>
> [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF 
> out2.tif -srcwin 0 0 2000 2000
> Input file size is 14336, 14336
> 0...10...20...30...40...50...60...70...80...90...100 - done.
> real  0m1.177s
> user 0m0.976s
> sys   0m0.188s
>
> And I also checked the integrity of the results to see if I get the same 
> result:
>
> [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out.tif
> ...
> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>  Checksum=58248
> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>  Checksum=21226
> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>  Checksum=8002
>
> [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out2.tif
> ...
> Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
>  Checksum=58248
> Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
>  Checksum=21226
> Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
>  Checksum=8002
>
> What are others test would be interesting to perform?
>
> Best regards,
>
> Ivan
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> Discuss mailing list
> Discuss@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>
___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss


[OSGeo-Discuss] Raster data on RDBMS

2008-10-28 Thread Lucena, Ivan
Hi There,

I would like to return to a discussion that we had months ago about raster on 
RDBMS. But this time I would like to present some number.

As long as I could recall there was basically two major arguments contrary to 
storing raster on RDBMS. One very pragmatical: "Why waste precious process time 
with the overhead of dealing with queries, tables, client-sever back and forth 
just to get the data from BLOB fields on a database when you can get it 
directly from the file system?". The other argument was semantical: "Why store 
raster on RDBMS if in general we are not expecting to have a transactions on 
that data?"

I cannot argue against the second one. I basically agreed with that but after 
seeing how fragile and complicated even a well defined structure of folders and 
files could be I would vote in favor of the good and old relational model.

That is my experiment. I downloaded two free data samples from Naveteq website. 
Two geotiff files with the same size and number of bands (14336, 14336,  3):

[EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF
602828  Barcelona_2007_R2C2.TIF
[EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF
602828  San_Francisco_2006_R1C2.TIF

Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The loading 
process is comparable than some commercial ETL products on the market. It took 
about 2 minutes to load each image.

[EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster 
Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER
real  1m54.973s
user 0m4.368s
sys   0m1.936s

If you are a Oracle GeoRaster users you might be excited about those number 
already but those are not the numbers I want to show. What I would like to do 
is to compare the time that it takes to extract subset from the original 
geotiff and compare with the time to extract the same subset from the RDBMS. He 
are the numbers:

[EMAIL PROTECTED]:~/Data> time gdal_translate 
georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real  0m0.720s
user 0m0.408s
sys   0m0.108s

[EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF out2.tif 
-srcwin 0 0 2000 2000
Input file size is 14336, 14336
0...10...20...30...40...50...60...70...80...90...100 - done.
real  0m1.177s
user 0m0.976s
sys   0m0.188s

And I also checked the integrity of the results to see if I get the same result:

[EMAIL PROTECTED]:~/Data> gdalinfo -checksum out.tif
...
Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
  Checksum=58248
Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
  Checksum=21226
Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
  Checksum=8002

[EMAIL PROTECTED]:~/Data> gdalinfo -checksum out2.tif
...
Band 1 Block=2000x1 Type=Byte, ColorInterp=Red
  Checksum=58248
Band 2 Block=2000x1 Type=Byte, ColorInterp=Green
  Checksum=21226
Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue
  Checksum=8002

What are others test would be interesting to perform?

Best regards,

Ivan












___
Discuss mailing list
Discuss@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss