Re: [gdal-dev] Clarification on what is not addressed by GDALFlushCache
Thanks, Even. I've flushed out rasterio's usage of FlushCache. On Thu, Oct 3, 2019 at 11:04 AM Even Rouault wrote: > On jeudi 3 octobre 2019 10:14:45 CEST Sean Gillies wrote: > > In the comments above FlushCache() in gcore/gdaldataset.cpp it is said: > > > > * Using this method does not prevent use from calling GDALClose() > > * to properly close a dataset and ensure that important data not > addressed > > * by FlushCache() is written in the file. > > > Does it vary by > > format and driver? > > Of course, wouldn't be fun otherwise. For some formats, it might result in > a > completely consistent dataset, and in others, in something that can't be > opened at all. So what is does, beyond evicting 'dirty' blocks from the > cahce, > is mostly an implementation detail. > > > What exactly is the important data that is not addressed? > > In the case of GeoTIFF, FlushCache() will for example ensure that all tile/ > strip data is flushed to disk, but the TileByteCount/TileOffset index > arrays > are not updated, and os a file that was just created, they will be at > their > zero default value, making the dataset appear to be empty to a reader that > would try to open it at that point. > > If generating a large dataset, you can for example call FlushCache() at > regular intervals to make sure that there is sufficiently space on the > storage > device (but the global block cache will also flush when it is saturated). > This > might be a way of avoiding the memory to reach the GDAL_CACHEMAX > threshold. > But this can also result in suboptimal behaviour if you call it at > inappropriate point. For example if you write to a JPEG-compressed tiled > TIFF, > and your write pattern is row per row, then flushing before you reach a > row > number that is multiple of the tile height, will flush partially written > blocks (their top will contain real data, and the bottom zeroes). So those > blocks will be later decompressed and recompressed, causing unnecessary > quality loss. > > FlushCache() is automatically called by dataset destructor, so my tip > would > be: "do not use FlushCache() unless you know you need it" > > Even > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com > -- Sean Gillies ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Clarification on what is not addressed by GDALFlushCache
On jeudi 3 octobre 2019 10:14:45 CEST Sean Gillies wrote: > In the comments above FlushCache() in gcore/gdaldataset.cpp it is said: > > * Using this method does not prevent use from calling GDALClose() > * to properly close a dataset and ensure that important data not addressed > * by FlushCache() is written in the file. > Does it vary by > format and driver? Of course, wouldn't be fun otherwise. For some formats, it might result in a completely consistent dataset, and in others, in something that can't be opened at all. So what is does, beyond evicting 'dirty' blocks from the cahce, is mostly an implementation detail. > What exactly is the important data that is not addressed? In the case of GeoTIFF, FlushCache() will for example ensure that all tile/ strip data is flushed to disk, but the TileByteCount/TileOffset index arrays are not updated, and os a file that was just created, they will be at their zero default value, making the dataset appear to be empty to a reader that would try to open it at that point. If generating a large dataset, you can for example call FlushCache() at regular intervals to make sure that there is sufficiently space on the storage device (but the global block cache will also flush when it is saturated). This might be a way of avoiding the memory to reach the GDAL_CACHEMAX threshold. But this can also result in suboptimal behaviour if you call it at inappropriate point. For example if you write to a JPEG-compressed tiled TIFF, and your write pattern is row per row, then flushing before you reach a row number that is multiple of the tile height, will flush partially written blocks (their top will contain real data, and the bottom zeroes). So those blocks will be later decompressed and recompressed, causing unnecessary quality loss. FlushCache() is automatically called by dataset destructor, so my tip would be: "do not use FlushCache() unless you know you need it" Even -- Spatialys - Geospatial professional services http://www.spatialys.com ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] Clarification on what is not addressed by GDALFlushCache
In the comments above FlushCache() in gcore/gdaldataset.cpp it is said: * Using this method does not prevent use from calling GDALClose() * to properly close a dataset and ensure that important data not addressed * by FlushCache() is written in the file. What exactly is the important data that is not addressed? Does it vary by format and driver? Thanks! -- Sean Gillies ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev