Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and Multi-Threading for MODIS Data
Hi, ReprojectImage is an older API that doesn't support the full set of warping options. If you're asking about -multi and -wm NUM_THREADS, they don't get (at least in gdalwarp and IIRC) enabled automatically by GDAL_NUM_THREADS. Switching to gdal.Warp might be worthwhile. Note that pymodis is quite old and could probably do with some updates. Laurentiu On Wed, Apr 2, 2025, at 08:34, Varisht Ghedia via gdal-dev wrote: > Hi Laurentiu, > > I am using the pymodis library: > https://github.com/lucadelu/pyModis/tree/master to extract the LST and QC > bands from a MODIS (aqua / terra) MOD11A1 product. Upon checking the code, it > looks like internally the library has the following gdal calls for the tasks > I execute: > gdal.AutoCreateWarpedVRT > gdal.ReprojectImage > > I execute the script like this: > modis_convert.py -s "( 1 0 0 0 0 0 0 0 0 0 0 0 )" -g 30 -o 2025-03-14 -e > 32618 MOD11A1.A2025073.h10v10.061.2025074095514.hdf > > Here: > -s : Select the bands to extract (LST in this case) > -g : Spatial resolution of the output file (30m) > -o : Prefix of the output file > -e : EPSG code for the output (EPSG:32618) > MOD11A1.A2025073.h10v10.061.2025074095514.hdf: MODIS terra product > > To test the effects of cache and multi-threading I set the config options at > the start of the program like this: > gdal.SetConfigOption("GDAL_NUM_THREADS", "ALL_CPUS") > gdal.SetConfigOption("GDAL_CACHEMAX", "2G") > > RAM usage is not much of a concern as at a time, I process a single product > for now, so I can allocate a higher amount if needed and if it speeds up > things. > > Thanks for your insights regarding NUM_THREADS and CACHEMAX. Is there a > dedicated option to enable multi-threading i.e. -m using python or does > ALL_CPUS enable multi-threading automatically. Is there a difference between > -m and ALL_CPUS? > > Thanks and Regards, > Varisht Ghedia > > On Tue, 1 Apr 2025 at 22:15, wrote: >> Send gdal-dev mailing list submissions to >> gdal-dev@lists.osgeo.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://lists.osgeo.org/mailman/listinfo/gdal-dev >> or, via email, send a message with subject or body 'help' to >> gdal-dev-requ...@lists.osgeo.org >> >> You can reach the person managing the list at >> gdal-dev-ow...@lists.osgeo.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of gdal-dev digest..." >> Today's Topics: >> >>1. Re: Fwd: Performance Variability with GDAL Caching and >> Multi-Threading for MODIS Data (Lauren?iu Nicola) >>2. GDAL 3.10.3 release candidate available (Even Rouault) >>3. Proposal for GDAL Driver: EOPF Zarr (Earth Observation >> Product Format) (Adagale Yuvraj Bhagwan) >> >> >> >> -- Forwarded message -- >> From: "Laurențiu Nicola" >> To: gdal-dev@lists.osgeo.org >> Cc: >> Bcc: >> Date: Tue, 01 Apr 2025 10:40:43 +0300 >> Subject: Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and >> Multi-Threading for MODIS Data >> __ >> Hi, >> >> Since it's not exactly clear from your description, what operations are you >> running, just the equivalent of gdal.Translate()? gdal.Warp()? GDAL can use >> threading in a couple of places: >> • to compress the output before writing it, e.g. the NUM_THREADS creation >> option of GTiff >> • to decompress the input when reading a region larger than one block or >> strip, e.g. the NUM_THREADS open option of GTiff >> • for pipelining the I/O and warping in gdalwarp (-multi) >> • to parallelize warping itself in gdalwarp (-wo NUM_THREADS) >> And of course, there might be others I'm not aware of. >> >> I'm not sure about the effects you see when setting the cache, but note that >> the default cache GDAL_CACHEMAX is "5% of the usable physical RAM, [...] >> consulted the first time the cache size is requested". To disable the cache >> you can use GDAL_CACHEMAX=0, which can reduce the memory usage and speed up >> the program in very specific cases (e.g. when processing one block at a time >> without reading parts of the input twice), but becomes a lot less useful >> when you do any kind of warping or resampling. >> >> Laurentiu >> >> On Tue, Apr 1, 2025, at 10:19, Varisht Ghedia via gdal-dev wrote: >>> Dear GDAL Developers, >>> >>> I am working on optimizing
Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and Multi-Threading for MODIS Data
Hi Laurentiu, I am using the pymodis library: https://github.com/lucadelu/pyModis/tree/master to extract the LST and QC bands from a MODIS (aqua / terra) MOD11A1 product. Upon checking the code, it looks like internally the library has the following gdal calls for the tasks I execute: gdal.AutoCreateWarpedVRT gdal.ReprojectImage I execute the script like this: modis_convert.py -s "( 1 0 0 0 0 0 0 0 0 0 0 0 )" -g 30 -o 2025-03-14 -e 32618 MOD11A1.A2025073.h10v10.061.2025074095514.hdf Here: -s : Select the bands to extract (LST in this case) -g : Spatial resolution of the output file (30m) -o : Prefix of the output file -e : EPSG code for the output (EPSG:32618) MOD11A1.A2025073.h10v10.061.2025074095514.hdf: MODIS terra product To test the effects of cache and multi-threading I set the config options at the start of the program like this: gdal.SetConfigOption("GDAL_NUM_THREADS", "ALL_CPUS") gdal.SetConfigOption("GDAL_CACHEMAX", "2G") RAM usage is not much of a concern as at a time, I process a single product for now, so I can allocate a higher amount if needed and if it speeds up things. Thanks for your insights regarding NUM_THREADS and CACHEMAX. Is there a dedicated option to enable multi-threading i.e. -m using python or does ALL_CPUS enable multi-threading automatically. Is there a difference between -m and ALL_CPUS? Thanks and Regards, Varisht Ghedia On Tue, 1 Apr 2025 at 22:15, wrote: > Send gdal-dev mailing list submissions to > gdal-dev@lists.osgeo.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.osgeo.org/mailman/listinfo/gdal-dev > or, via email, send a message with subject or body 'help' to > gdal-dev-requ...@lists.osgeo.org > > You can reach the person managing the list at > gdal-dev-ow...@lists.osgeo.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of gdal-dev digest..." > Today's Topics: > >1. Re: Fwd: Performance Variability with GDAL Caching and > Multi-Threading for MODIS Data (Lauren?iu Nicola) >2. GDAL 3.10.3 release candidate available (Even Rouault) >3. Proposal for GDAL Driver: EOPF Zarr (Earth Observation > Product Format) (Adagale Yuvraj Bhagwan) > > > > -- Forwarded message ---------- > From: "Laurențiu Nicola" > To: gdal-dev@lists.osgeo.org > Cc: > Bcc: > Date: Tue, 01 Apr 2025 10:40:43 +0300 > Subject: Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and > Multi-Threading for MODIS Data > Hi, > > Since it's not exactly clear from your description, what operations are > you running, just the equivalent of gdal.Translate()? gdal.Warp()? GDAL > can use threading in a couple of places: > >- to compress the output before writing it, e.g. the NUM_THREADS >creation option of GTiff >- to decompress the input when reading a region larger than one block >or strip, e.g. the NUM_THREADS open option of GTiff >- for pipelining the I/O and warping in gdalwarp (-multi) >- to parallelize warping itself in gdalwarp (-wo NUM_THREADS) > > And of course, there might be others I'm not aware of. > > I'm not sure about the effects you see when setting the cache, but note > that the default cache GDAL_CACHEMAX is "5% of the usable physical RAM, > [...] consulted the first time the cache size is requested". To disable the > cache you can use GDAL_CACHEMAX=0, which can reduce the memory usage and > speed up the program in very specific cases (e.g. when processing one block > at a time without reading parts of the input twice), but becomes a lot less > useful when you do any kind of warping or resampling. > > Laurentiu > > On Tue, Apr 1, 2025, at 10:19, Varisht Ghedia via gdal-dev wrote: > > Dear GDAL Developers, > > I am working on optimizing the processing times for MODIS datasets > (LST_1Km and QC Day tile) using pymodis with some modifications. > Specifically, I have added flags for: > >- > >Running on all available CPU cores (ALL_CORES) >- > >Adjusting GDAL cache size (GDAL_CACHEMAX) > > However, I am observing unexpected performance variations. In some cases, > increasing the cache size degrades performance instead of improving it. > Below are my test results for two different datasets from the same tile. > Tile used: MOD11A1.A2025073.h10v10.061.2025074095514.hdf > > EPSG:32618, Resampled to 30m > *QC_tile.tif* > > ALL_CORES + 2G > real0m24.199s > user0m53.352s > sys 0m9.998s > > STANDARD RUN (No Cache, No Multi-Threading) > real0m32.133s > user0m30.581s > sys 0m2.299s > > ALL_CORES + 512M > re
Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and Multi-Threading for MODIS Data
Hi, Since it's not exactly clear from your description, what operations are you running, just the equivalent of gdal.Translate()? gdal.Warp()? GDAL can use threading in a couple of places: • to compress the output before writing it, e.g. the NUM_THREADS creation option of GTiff • to decompress the input when reading a region larger than one block or strip, e.g. the NUM_THREADS open option of GTiff • for pipelining the I/O and warping in gdalwarp (-multi) • to parallelize warping itself in gdalwarp (-wo NUM_THREADS) And of course, there might be others I'm not aware of. I'm not sure about the effects you see when setting the cache, but note that the default cache GDAL_CACHEMAX is "5% of the usable physical RAM, [...] consulted the first time the cache size is requested". To disable the cache you can use GDAL_CACHEMAX=0, which can reduce the memory usage and speed up the program in very specific cases (e.g. when processing one block at a time without reading parts of the input twice), but becomes a lot less useful when you do any kind of warping or resampling. Laurentiu On Tue, Apr 1, 2025, at 10:19, Varisht Ghedia via gdal-dev wrote: > Dear GDAL Developers, > > I am working on optimizing the processing times for MODIS datasets (LST_1Km > and QC Day tile) using `pymodis` with some modifications. Specifically, I > have added flags for: > > • Running on all available CPU cores (`ALL_CORES`) > > • Adjusting GDAL cache size (`GDAL_CACHEMAX`) > > However, I am observing unexpected performance variations. In some cases, > increasing the cache size degrades performance instead of improving it. Below > are my test results for two different datasets from the same tile. Tile used: > MOD11A1.A2025073.h10v10.061.2025074095514.hdf > > EPSG:32618, Resampled to 30m > > *QC_tile.tif* > > `ALL_CORES + 2G > real0m24.199s > user0m53.352s > sys 0m9.998s > > STANDARD RUN (No Cache, No Multi-Threading) > real0m32.133s > user0m30.581s > sys 0m2.299s > > ALL_CORES + 512M > real0m13.830s > user0m51.083s > sys 0m1.911s ` > With 512M cache, performance improves significantly, but with larger caches > (1G, 2G, 4G), execution time increases. > > *LST_Day_1km.tif* > > `ALL_CORES + 512M > real0m42.863s > user0m44.105s > sys 0m3.583s > > STANDARD RUN (No Cache, No Multi-Threading) > real0m45.121s > user0m26.477s > sys 0m3.712s > > ALL_CORES + 2G > real0m37.548s > user0m48.302s > sys 0m8.113s > > ALL_CORES + 4G > real0m51.845s > user0m48.213s > sys 0m7.988s ` > For this dataset, using a 2G cache improves performance, but increasing it to > 4G makes processing slower. > > *Questions:* > > 1. How does GDAL’s caching mechanism impact performance in these scenarios? > > 2. Why does increasing cache size sometimes degrade performance? > > 3. Is there a recommended way to tune cache settings for MODIS HDF > processing, considering that some layers (like QC) behave differently from > others (like LST_1Km)? > > Any insights into how GDAL handles multi-threading and caching internally > would be greatly appreciated. > > Thanks in advance for your help! > > Best regards, > > Varisht Ghedia > > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev