RE: [OSGeo-Discuss] Raster data on RDBMS
Thanks Jason, Those results that I reported attest the performance of GDAL tools. You could get completely different results if you use tools from other vendors, like XCI, ACXG1S and FM3 [1]. There are some options on how that could be implemented but I believe we did some good choices in GDAL. Best regards, Ivan [1] - These names were mixed on purpose ;) > ---Original Message--- > From: Jason Birch <[EMAIL PROTECTED]> > Subject: RE: [OSGeo-Discuss] Raster data on RDBMS > Sent: Oct 29 '08 15:09 > > I find this stuff fascinating, but I believe that the Oracle EULA prohibits > users from disclosing the results of benchmark tests. Be careful how you > represent these results. > > Jason > > -Original Message- > From: Lucena, Ivan > Subject: [OSGeo-Discuss] Raster data on RDBMS > > I would like to return to a discussion that we had months ago about raster > on RDBMS. But this time I would like to present some number. > > ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
RE: [OSGeo-Discuss] Raster data on RDBMS
I understand. It is the difference between having to (1)serve fixed size tiles really fast out to GEarth/OpenLayers/etc, or (2) having to provide a random image or derivation thereof in any size, format, projection(WCS?). In the second case, speeding up the backend is the only way, in the first case, a tile cache could be a better choice. There is another possibility, that of putting a self-building cache on the backend, between the raw datastore and the WCS server, it could even be a database with some PGChip scheme. <> It may be faster from the db, as you can leverage the spatial index, and only pull the tiles you need, but I think Frank can answer this better. I'm currently most interested in the first case of serving tiles, so if anyone has ideas on best practice for this, that would be great. Roger Bedell, President Sylvan Ascent Inc. From: [EMAIL PROTECTED] on behalf of G. Allegri Sent: Wed 10/29/2008 12:50 PM To: OSGeo Discussions Subject: Re: [OSGeo-Discuss] Raster data on RDBMS Thanks everyone for this interesting thread. I think the two approaches have different goals: - rendering-on-demand performance comparison - raster serving performance comparison Both are of interest, but they shouldn't be confused. It might be helpful to write a wiki page (or something else) where to gather the "best-practices" on serving (big) rasters. Well, it could be interesting for vectors too, but it's a different story. It's a common task for many of us, and it would be of help for both the newbies and the more experienced users... Ok, sorry for this OT digression :) 2008/10/29 Frank Warmerdam <[EMAIL PROTECTED]>: > Sylvan Ascent Inc. wrote: >> >> I think, that since the goal of all this storage of pyramids and the like >> is >> just to get speed, that they aren't apples/oranges, but apples apples, >> since >> they are both pyramid schemes, just in different places, either in front, >> or >> in back of the server. > > Roger, > > My point is that a tile caching approach is really comparing tile caching > performance to rendering-on-demand performance while I think the original > point was that rendering-from-database and rendering-from-filesystem could > have similar performance for input raster data. > > Your comparison is also of interest but I don't think it is fair to compare > rendering from Oracle through MapServer (or GeoServer) to satisfying > map requests directly from a tile cache. > > Best regards, > -- > ---+-- > I set the clouds in motion - turn up | Frank Warmerdam, > [EMAIL PROTECTED] > light and sound - activate the windows | http://pobox.com/~warmerdam > and watch the world go round - Rush| Geospatial Programmer for Rent > > ___ > Discuss mailing list > Discuss@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/discuss > ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss <>___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
Re: [OSGeo-Discuss] Raster data on RDBMS
Frank, > My point is that a tile caching approach is really comparing tile caching > performance to rendering-on-demand performance while I think the original > point was that rendering-from-database and rendering-from-filesystem could > have similar performance for input raster data. D'accord. > Your comparison is also of interest but I don't think it is fair to compare > rendering from Oracle through MapServer (or GeoServer) to satisfying > map requests directly from a tile cache. I shouldn't have mentioned MapServer. Web-serving wasn't my point to beginning with. I can gdal_translate from the georaster driver to vrt and open the result on OpenEV. That gives me a good idea of how the driver perform in a rendering environment. I would need to change OpenEV do it directly. But by doing that I would be testing GDAL not Oracle. To see Oracle Georaster in action I can use some freeviewer (free as in gratis). That is not my point. The real point is should we discard RDBMS for Raster storage just because we are sure that there will be overhead, ours direct fopen(), fread() will always be faster? Myth or fact? Those tests have proven otherwise so the question is what is going own? I messed around with some free-open-source RDBMS a long time ago (last century actually), checking out how to create type extension. But I would not imagine getting into to the core of how does things work just for the fun of it. So, the only thing I can do is to check the results from the outside and Oracle+GDAL/GeoRaster is the environment that let me do that because they let you use the software without a license, as long it is not on production mode. I should test mySQL+TerraLib or PostGIS+GDAL/PGCHIP also. Maybe. Best regards, Ivan ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
Re: [OSGeo-Discuss] Raster data on RDBMS
Thanks everyone for this interesting thread. I think the two approaches have different goals: - rendering-on-demand performance comparison - raster serving performance comparison Both are of interest, but they shouldn't be confused. It might be helpful to write a wiki page (or something else) where to gather the "best-practices" on serving (big) rasters. Well, it could be interesting for vectors too, but it's a different story. It's a common task for many of us, and it would be of help for both the newbies and the more experienced users... Ok, sorry for this OT digression :) 2008/10/29 Frank Warmerdam <[EMAIL PROTECTED]>: > Sylvan Ascent Inc. wrote: >> >> I think, that since the goal of all this storage of pyramids and the like >> is >> just to get speed, that they aren't apples/oranges, but apples apples, >> since >> they are both pyramid schemes, just in different places, either in front, >> or >> in back of the server. > > Roger, > > My point is that a tile caching approach is really comparing tile caching > performance to rendering-on-demand performance while I think the original > point was that rendering-from-database and rendering-from-filesystem could > have similar performance for input raster data. > > Your comparison is also of interest but I don't think it is fair to compare > rendering from Oracle through MapServer (or GeoServer) to satisfying > map requests directly from a tile cache. > > Best regards, > -- > ---+-- > I set the clouds in motion - turn up | Frank Warmerdam, > [EMAIL PROTECTED] > light and sound - activate the windows | http://pobox.com/~warmerdam > and watch the world go round - Rush| Geospatial Programmer for Rent > > ___ > Discuss mailing list > Discuss@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/discuss > ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
Re: [OSGeo-Discuss] Raster data on RDBMS
Sylvan Ascent Inc. wrote: I think, that since the goal of all this storage of pyramids and the like is just to get speed, that they aren't apples/oranges, but apples apples, since they are both pyramid schemes, just in different places, either in front, or in back of the server. Roger, My point is that a tile caching approach is really comparing tile caching performance to rendering-on-demand performance while I think the original point was that rendering-from-database and rendering-from-filesystem could have similar performance for input raster data. Your comparison is also of interest but I don't think it is fair to compare rendering from Oracle through MapServer (or GeoServer) to satisfying map requests directly from a tile cache. Best regards, -- ---+-- I set the clouds in motion - turn up | Frank Warmerdam, [EMAIL PROTECTED] light and sound - activate the windows | http://pobox.com/~warmerdam and watch the world go round - Rush| Geospatial Programmer for Rent ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
RE: [OSGeo-Discuss] Raster data on RDBMS
Hi Frank, I'm not sure I completely agree about your apples/oranges, here's why: We are in the process of putting together a big raster server, so I'm evaluating the best way to proceed. I'm quite familiar with putting raster tiles in a database, did that way back in the last century. I've decided to use Geoserver, since we need (want) WFS-T and WCS, and the latest speed race between MapServer and GeoServer seems to be a tie more or less. As I see it we have several options: 1) Put the big original raster images in back of Geoserver, access using their Mosaic and the GDAL based ImageIO-ext. Advantage - easy, but kind of slow. 2) Take the originals and build a file-based pyramid Advantage - faster, but a lot of work, plus duplication and tricky to keep updated as new data comes in. 3) Take the originals and build a PostGIS based pyramid. Likely, about the same as 3 in speed and work and duplication. 4) Do 1, but put a pyramiding tileserver in front. It builds the pyrimid in 2 and 3 over time, and is likely the fastest if you hit the cache, and is no harder to do than #1. I think, that since the goal of all this storage of pyramids and the like is just to get speed, that they aren't apples/oranges, but apples apples, since they are both pyramid schemes, just in different places, either in front, or in back of the server. Roger Bedell, President Sylvan Ascent Inc. 800-362-8971 +34 626 855 662 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> www.sylvanascent.com <http://www.sylvanascent.com/> www.topodepot.com <http://www.topodepot.com/> From: [EMAIL PROTECTED] on behalf of Frank Warmerdam Sent: Wed 10/29/2008 10:43 AM To: OSGeo Discussions Subject: Re: [OSGeo-Discuss] Raster data on RDBMS Sylvan Ascent Inc. wrote: > Mike and Ivan, > > I'd like to see them also compared to a caching solution, like GeoWebCache, > or TileCache. These effectively create a file-based "database" of little > bitty tiles at certain resolutions, kind of like a tile pyrimid that is > created gradually over time as the image data is accessed. > > One would think the file-based cache system would be faster than a similar > database solution, with the database solution giving no real benefits that I > can see. Roger, While it might be educational to compare to a tilecache solution it is really comparing apples and oranges. Best regards, -- ---+-- I set the clouds in motion - turn up | Frank Warmerdam, [EMAIL PROTECTED] light and sound - activate the windows | http://pobox.com/~warmerdam and watch the world go round - Rush| Geospatial Programmer for Rent ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss <>___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
RE: [OSGeo-Discuss] Raster data on RDBMS
I find this stuff fascinating, but I believe that the Oracle EULA prohibits users from disclosing the results of benchmark tests. Be careful how you represent these results. Jason -Original Message- From: Lucena, Ivan Subject: [OSGeo-Discuss] Raster data on RDBMS I would like to return to a discussion that we had months ago about raster on RDBMS. But this time I would like to present some number. ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
Re: [OSGeo-Discuss] Raster data on RDBMS
Sylvan Ascent Inc. wrote: Mike and Ivan, I'd like to see them also compared to a caching solution, like GeoWebCache, or TileCache. These effectively create a file-based "database" of little bitty tiles at certain resolutions, kind of like a tile pyrimid that is created gradually over time as the image data is accessed. One would think the file-based cache system would be faster than a similar database solution, with the database solution giving no real benefits that I can see. Roger, While it might be educational to compare to a tilecache solution it is really comparing apples and oranges. Best regards, -- ---+-- I set the clouds in motion - turn up | Frank Warmerdam, [EMAIL PROTECTED] light and sound - activate the windows | http://pobox.com/~warmerdam and watch the world go round - Rush| Geospatial Programmer for Rent ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
RE: [OSGeo-Discuss] Raster data on RDBMS
Mike and Ivan, I'd like to see them also compared to a caching solution, like GeoWebCache, or TileCache. These effectively create a file-based "database" of little bitty tiles at certain resolutions, kind of like a tile pyrimid that is created gradually over time as the image data is accessed. One would think the file-based cache system would be faster than a similar database solution, with the database solution giving no real benefits that I can see. >>>I basically agreed with that but >>> after seeing how fragile and complicated even a well defined structure of >>> folders and files could be I would vote in favor of the good and old >>> relational model. Since the cache is maintained by the software in a completely defined way, and never messed with by humans, I wonder what could go wrong? Roger Bedell, Sylvan Ascent Inc. 800-362-8971 +34 626 855 662 [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> From: [EMAIL PROTECTED] on behalf of Smith, Michael ERDC-CRREL-NH Sent: Wed 10/29/2008 10:13 AM To: Lucena, Ivan; OSGeo Discussions; Paul Ramsey Subject: Re: [OSGeo-Discuss] Raster data on RDBMS Ivan, Those numbers look impressive. We are just starting to set up some new hardware here and I plan to do some testing also. Perhaps we can collaborate and come up with a test suite in order to track these numbers across builds. Mike -- Michael Smith RSGIS Center ERDC - CRREL US Army Corps of Engineers On 10/29/08 1:35 AM, "Lucena, Ivan" <[EMAIL PROTECTED]> wrote: > Paul, > > Good thought. > > Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1). > That is good because GeoTiffs doesn't tile on band space. So I would imagine > that if I tiled the GeoTiff this way: > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF > Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m42.991s > user 0m20.289s > sys 0m2.516s > > The comparison would be fair: > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF > out2.tif -srcwin 0 0 2000 2000 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m1.604s > user 0m1.156s > sys 0m0.444s > > What do you think? > > I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one > application could take advantage of it by telling Oracle to cache the BLOB in > memory. So the next time a user zoom-in the performance would be even better. > > I am trying to setup a mapserver experiment on that issue but for now I would > like to keep my analysis on that very simple process of extracting a subset. > > Best regards, > > Ivan > > >> ---Original Message--- >> From: Paul Ramsey <[EMAIL PROTECTED]> >> Subject: Re: [OSGeo-Discuss] Raster data on RDBMS >> Sent: Oct 29 '08 05:00 >> >> The data is chunked in Oracle into tiles, so unless you tile the TIFF >> as well you aren't really doing a direct comparison. Even if you end >> up with the same numbers for both processes, I'll still be impressed, >> since I assumed Oracle would have a higher overhead. >> >> P. >> >> On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> >> wrote: >>> Hi There, >>> >>> I would like to return to a discussion that we had months ago about raster >>> on RDBMS. But this time I would like to present some number. >>> >>> As long as I could recall there was basically two major arguments contrary >>> to storing raster on RDBMS. One very pragmatical: "Why waste precious >>> process time with the overhead of dealing with queries, tables, client-sever >>> back and forth just to get the data from BLOB fields on a database when you >>> can get it directly from the file system?". The other argument was >>> semantical: "Why store raster on RDBMS if in general we are not expecting to >>> have a transactions on that data?" >>> >>> I cannot argue against the second one. I basically agreed with that but >>> after seeing how fragile and complicated even a well defined structure of >>> folders and files could be I would vote in favor of the good and old >>> relational model. >>> >>> That is my experiment. I downloaded two free data samples from Naveteq >>> website. Two geotiff files with the same size and number of bands (14336, >>> 14336, 3): >>> >>>
Re: [OSGeo-Discuss] Raster data on RDBMS
Ivan, Those numbers look impressive. We are just starting to set up some new hardware here and I plan to do some testing also. Perhaps we can collaborate and come up with a test suite in order to track these numbers across builds. Mike -- Michael Smith RSGIS Center ERDC - CRREL US Army Corps of Engineers On 10/29/08 1:35 AM, "Lucena, Ivan" <[EMAIL PROTECTED]> wrote: > Paul, > > Good thought. > > Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1). > That is good because GeoTiffs doesn't tile on band space. So I would imagine > that if I tiled the GeoTiff this way: > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF > Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m42.991s > user 0m20.289s > sys 0m2.516s > > The comparison would be fair: > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF > out2.tif -srcwin 0 0 2000 2000 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m1.604s > user 0m1.156s > sys 0m0.444s > > What do you think? > > I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one > application could take advantage of it by telling Oracle to cache the BLOB in > memory. So the next time a user zoom-in the performance would be even better. > > I am trying to setup a mapserver experiment on that issue but for now I would > like to keep my analysis on that very simple process of extracting a subset. > > Best regards, > > Ivan > > >> ---Original Message--- >> From: Paul Ramsey <[EMAIL PROTECTED]> >> Subject: Re: [OSGeo-Discuss] Raster data on RDBMS >> Sent: Oct 29 '08 05:00 >> >> The data is chunked in Oracle into tiles, so unless you tile the TIFF >> as well you aren't really doing a direct comparison. Even if you end >> up with the same numbers for both processes, I'll still be impressed, >> since I assumed Oracle would have a higher overhead. >> >> P. >> >> On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> >> wrote: >>> Hi There, >>> >>> I would like to return to a discussion that we had months ago about raster >>> on RDBMS. But this time I would like to present some number. >>> >>> As long as I could recall there was basically two major arguments contrary >>> to storing raster on RDBMS. One very pragmatical: "Why waste precious >>> process time with the overhead of dealing with queries, tables, client-sever >>> back and forth just to get the data from BLOB fields on a database when you >>> can get it directly from the file system?". The other argument was >>> semantical: "Why store raster on RDBMS if in general we are not expecting to >>> have a transactions on that data?" >>> >>> I cannot argue against the second one. I basically agreed with that but >>> after seeing how fragile and complicated even a well defined structure of >>> folders and files could be I would vote in favor of the good and old >>> relational model. >>> >>> That is my experiment. I downloaded two free data samples from Naveteq >>> website. Two geotiff files with the same size and number of bands (14336, >>> 14336, 3): >>> >>> [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF >>> 602828 Barcelona_2007_R2C2.TIF >>> [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF >>> 602828 San_Francisco_2006_R1C2.TIF >>> >>> Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The >>> loading process is comparable than some commercial ETL products on the >>> market. It took about 2 minutes to load each image. >>> >>> [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster >>> Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2 >>> Input file size is 14336, 14336 >>> 0...10...20...30...40...50...60...70...80...90...100 - done. >>> Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER >>> real 1m54.973s >>> user 0m4.368s >>> sys 0m1.936s >>> >>> If you are a Oracle GeoRaster users you might be excited about those number >>> already but those are not the numbers I want to show. What I would like to >>> do is to compare the time that it takes to extract subse
Re: [OSGeo-Discuss] Raster data on RDBMS
Paul, Good thought. Let's see. The default blocking used by the GeoRaster driver is (256, 256, 1). That is good because GeoTiffs doesn't tile on band space. So I would imagine that if I tiled the GeoTiff this way: [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF Barcelona_2007_R2C2_tiled.TIF -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 Input file size is 14336, 14336 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m42.991s user 0m20.289s sys 0m2.516s The comparison would be fair: [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2_tiled.TIF out2.tif -srcwin 0 0 2000 2000 Input file size is 14336, 14336 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m1.604s user 0m1.156s sys 0m0.444s What do you think? I would imagine that if I run gdaladdo to add Pyramids on the GeoRaster one application could take advantage of it by telling Oracle to cache the BLOB in memory. So the next time a user zoom-in the performance would be even better. I am trying to setup a mapserver experiment on that issue but for now I would like to keep my analysis on that very simple process of extracting a subset. Best regards, Ivan > ---Original Message--- > From: Paul Ramsey <[EMAIL PROTECTED]> > Subject: Re: [OSGeo-Discuss] Raster data on RDBMS > Sent: Oct 29 '08 05:00 > > The data is chunked in Oracle into tiles, so unless you tile the TIFF > as well you aren't really doing a direct comparison. Even if you end > up with the same numbers for both processes, I'll still be impressed, > since I assumed Oracle would have a higher overhead. > > P. > > On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> wrote: > > Hi There, > > > > I would like to return to a discussion that we had months ago about raster > on RDBMS. But this time I would like to present some number. > > > > As long as I could recall there was basically two major arguments contrary > to storing raster on RDBMS. One very pragmatical: "Why waste precious process > time with the overhead of dealing with queries, tables, client-sever back and > forth just to get the data from BLOB fields on a database when you can get it > directly from the file system?". The other argument was semantical: "Why > store raster on RDBMS if in general we are not expecting to have a > transactions on that data?" > > > > I cannot argue against the second one. I basically agreed with that but > after seeing how fragile and complicated even a well defined structure of > folders and files could be I would vote in favor of the good and old > relational model. > > > > That is my experiment. I downloaded two free data samples from Naveteq > website. Two geotiff files with the same size and number of bands (14336, > 14336, 3): > > > > [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF > > 602828 Barcelona_2007_R2C2.TIF > > [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF > > 602828 San_Francisco_2006_R1C2.TIF > > > > Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The > loading process is comparable than some commercial ETL products on the > market. It took about 2 minutes to load each image. > > > > [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster > Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2 > > Input file size is 14336, 14336 > > 0...10...20...30...40...50...60...70...80...90...100 - done. > > Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER > > real 1m54.973s > > user 0m4.368s > > sys 0m1.936s > > > > If you are a Oracle GeoRaster users you might be excited about those > number already but those are not the numbers I want to show. What I would > like to do is to compare the time that it takes to extract subset from the > original geotiff and compare with the time to extract the same subset from > the RDBMS. He are the numbers: > > > > [EMAIL PROTECTED]:~/Data> time gdal_translate > georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000 > > Input file size is 14336, 14336 > > 0...10...20...30...40...50...60...70...80...90...100 - done. > > real 0m0.720s > > user 0m0.408s > > sys 0m0.108s > > > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF > out2.tif -srcwin 0 0 2000 2000 > > Input file size is 14336, 14336 > > 0...10...20...30...40...50...60...70...80...90...100 - done. > > real 0m1.177s > > user 0m0.976s > > sys 0m0.188s > >
Re: [OSGeo-Discuss] Raster data on RDBMS
The data is chunked in Oracle into tiles, so unless you tile the TIFF as well you aren't really doing a direct comparison. Even if you end up with the same numbers for both processes, I'll still be impressed, since I assumed Oracle would have a higher overhead. P. On Tue, Oct 28, 2008 at 9:54 PM, Lucena, Ivan <[EMAIL PROTECTED]> wrote: > Hi There, > > I would like to return to a discussion that we had months ago about raster on > RDBMS. But this time I would like to present some number. > > As long as I could recall there was basically two major arguments contrary to > storing raster on RDBMS. One very pragmatical: "Why waste precious process > time with the overhead of dealing with queries, tables, client-sever back and > forth just to get the data from BLOB fields on a database when you can get it > directly from the file system?". The other argument was semantical: "Why > store raster on RDBMS if in general we are not expecting to have a > transactions on that data?" > > I cannot argue against the second one. I basically agreed with that but after > seeing how fragile and complicated even a well defined structure of folders > and files could be I would vote in favor of the good and old relational model. > > That is my experiment. I downloaded two free data samples from Naveteq > website. Two geotiff files with the same size and number of bands (14336, > 14336, 3): > > [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF > 602828 Barcelona_2007_R2C2.TIF > [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF > 602828 San_Francisco_2006_R1C2.TIF > > Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The > loading process is comparable than some commercial ETL products on the > market. It took about 2 minutes to load each image. > > [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster > Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER > real 1m54.973s > user 0m4.368s > sys 0m1.936s > > If you are a Oracle GeoRaster users you might be excited about those number > already but those are not the numbers I want to show. What I would like to do > is to compare the time that it takes to extract subset from the original > geotiff and compare with the time to extract the same subset from the RDBMS. > He are the numbers: > > [EMAIL PROTECTED]:~/Data> time gdal_translate > georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m0.720s > user 0m0.408s > sys 0m0.108s > > [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF > out2.tif -srcwin 0 0 2000 2000 > Input file size is 14336, 14336 > 0...10...20...30...40...50...60...70...80...90...100 - done. > real 0m1.177s > user 0m0.976s > sys 0m0.188s > > And I also checked the integrity of the results to see if I get the same > result: > > [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out.tif > ... > Band 1 Block=2000x1 Type=Byte, ColorInterp=Red > Checksum=58248 > Band 2 Block=2000x1 Type=Byte, ColorInterp=Green > Checksum=21226 > Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue > Checksum=8002 > > [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out2.tif > ... > Band 1 Block=2000x1 Type=Byte, ColorInterp=Red > Checksum=58248 > Band 2 Block=2000x1 Type=Byte, ColorInterp=Green > Checksum=21226 > Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue > Checksum=8002 > > What are others test would be interesting to perform? > > Best regards, > > Ivan > > > > > > > > > > > > > ___ > Discuss mailing list > Discuss@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/discuss > ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss
[OSGeo-Discuss] Raster data on RDBMS
Hi There, I would like to return to a discussion that we had months ago about raster on RDBMS. But this time I would like to present some number. As long as I could recall there was basically two major arguments contrary to storing raster on RDBMS. One very pragmatical: "Why waste precious process time with the overhead of dealing with queries, tables, client-sever back and forth just to get the data from BLOB fields on a database when you can get it directly from the file system?". The other argument was semantical: "Why store raster on RDBMS if in general we are not expecting to have a transactions on that data?" I cannot argue against the second one. I basically agreed with that but after seeing how fragile and complicated even a well defined structure of folders and files could be I would vote in favor of the good and old relational model. That is my experiment. I downloaded two free data samples from Naveteq website. Two geotiff files with the same size and number of bands (14336, 14336, 3): [EMAIL PROTECTED]:~/Data> du -k Barcelona_2007_R2C2.TIF 602828 Barcelona_2007_R2C2.TIF [EMAIL PROTECTED]:~/Data> du -k San_Francisco_2006_R1C2.TIF 602828 San_Francisco_2006_R1C2.TIF Then I loaded those images to Oracle Spatial GeoRaster using GDAL. The loading process is comparable than some commercial ETL products on the market. It took about 2 minutes to load each image. [EMAIL PROTECTED]:~/Data> time gdal_translate -of georaster Barcelona_2007_R2C2.TIF georaster:scott,tiger,orcl,RDT_2$,2 Input file size is 14336, 14336 0...10...20...30...40...50...60...70...80...90...100 - done. Ouput dataset: (georaster:scott,tiger,orcl,RDT_2$,2) on GDAL_IMPORT,RASTER real 1m54.973s user 0m4.368s sys 0m1.936s If you are a Oracle GeoRaster users you might be excited about those number already but those are not the numbers I want to show. What I would like to do is to compare the time that it takes to extract subset from the original geotiff and compare with the time to extract the same subset from the RDBMS. He are the numbers: [EMAIL PROTECTED]:~/Data> time gdal_translate georaster:scott,tiger,orcl,RDT_2$,2 out.tif -srcwin 0 0 2000 2000 Input file size is 14336, 14336 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m0.720s user 0m0.408s sys 0m0.108s [EMAIL PROTECTED]:~/Data> time gdal_translate Barcelona_2007_R2C2.TIF out2.tif -srcwin 0 0 2000 2000 Input file size is 14336, 14336 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m1.177s user 0m0.976s sys 0m0.188s And I also checked the integrity of the results to see if I get the same result: [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out.tif ... Band 1 Block=2000x1 Type=Byte, ColorInterp=Red Checksum=58248 Band 2 Block=2000x1 Type=Byte, ColorInterp=Green Checksum=21226 Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue Checksum=8002 [EMAIL PROTECTED]:~/Data> gdalinfo -checksum out2.tif ... Band 1 Block=2000x1 Type=Byte, ColorInterp=Red Checksum=58248 Band 2 Block=2000x1 Type=Byte, ColorInterp=Green Checksum=21226 Band 3 Block=2000x1 Type=Byte, ColorInterp=Blue Checksum=8002 What are others test would be interesting to perform? Best regards, Ivan ___ Discuss mailing list Discuss@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/discuss