Re: [gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
> In short: multithreading is hard So true! With the introduction of tsan things are a little less bad, but tsan is still a tool with plenty of false positives and false negatives. And that assumes that a particular issue is covered by the tests being run under tsan. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
Thanks for the announcement, Even! I wonder if we should track such issues in a list? Or maybe give them a unique GitHub label? We don't plan to release a 3.6.5, correct? I'm going to make a Rasterio post release that patches 3.6.4 by tomorrow: https://github.com/rasterio/rasterio/issues/2943. On Mon, Oct 16, 2023 at 9:26 AM Even Rouault via gdal-dev < gdal-dev@lists.osgeo.org> wrote: > > Le 16/10/2023 à 17:14, Kurt Schwehr a écrit : > > Thanks for the heads up! > > Can you share that SHAs of the fix and cause? > > master: d083af1ec9c38e79bfb0532885570de6e5b8a410 > > 3.7 branch (should apply to 3.6 as well): > b5858ed5bc5004c97f7cd6000674015bdc70b33b > > cause: a worker thread of the multithreaded decoding could, in a situation > where the block cache is full, cause a "dirty" (ie modified but not yet > serialized to disk) block to be written, resulting in either a deadlock > between the lock of the multithreaded decoder and the lock of the job queue > mechanism, or other decoding threads could see their file handle being > seek() by another thread. In short: multithreading is hard. > > > On Mon, Oct 16, 2023, 7:44 AM Even Rouault via gdal-dev < > gdal-dev@lists.osgeo.org> wrote: > >> Hi, >> >> For GDAL 3.6.0 to 3.7.2, use of multi-threaded GTiff >> compression+decompression, in particular within the same file, as for >> example in a "gdalwarp -co COMPRESS=... -co NUM_THREADS=..." workflow >> can lead to deadlocks (process stalled forever) and/or data corruption >> (sometimes without errors at generation). Cf >> https://github.com/OSGeo/gdal/issues/8470 for a reproducer. The fix is >> in https://github.com/OSGeo/gdal/pull/8561 >> >> The issue is particularly visible on Windows, or more generally any >> operating system (or file system where the output file is located) which >> has no VSIVirtualHandle::PRead() implementation, but it can also be >> occasionally reproduced on Linux (at least as a deadlock). >> >> Even >> >> -- Sean Gillies ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
Le 16/10/2023 à 17:14, Kurt Schwehr a écrit : Thanks for the heads up! Can you share that SHAs of the fix and cause? master: d083af1ec9c38e79bfb0532885570de6e5b8a410 3.7 branch (should apply to 3.6 as well): b5858ed5bc5004c97f7cd6000674015bdc70b33b cause: a worker thread of the multithreaded decoding could, in a situation where the block cache is full, cause a "dirty" (ie modified but not yet serialized to disk) block to be written, resulting in either a deadlock between the lock of the multithreaded decoder and the lock of the job queue mechanism, or other decoding threads could see their file handle being seek() by another thread. In short: multithreading is hard. On Mon, Oct 16, 2023, 7:44 AM Even Rouault via gdal-dev wrote: Hi, For GDAL 3.6.0 to 3.7.2, use of multi-threaded GTiff compression+decompression, in particular within the same file, as for example in a "gdalwarp -co COMPRESS=... -co NUM_THREADS=..." workflow can lead to deadlocks (process stalled forever) and/or data corruption (sometimes without errors at generation). Cf https://github.com/OSGeo/gdal/issues/8470 for a reproducer. The fix is in https://github.com/OSGeo/gdal/pull/8561 The issue is particularly visible on Windows, or more generally any operating system (or file system where the output file is located) which has no VSIVirtualHandle::PRead() implementation, but it can also be occasionally reproduced on Linux (at least as a deadlock). Even -- http://www.spatialys.com My software is free, but my time generally not. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev -- http://www.spatialys.com My software is free, but my time generally not. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
Thanks for the heads up! Can you share that SHAs of the fix and cause? On Mon, Oct 16, 2023, 7:44 AM Even Rouault via gdal-dev < gdal-dev@lists.osgeo.org> wrote: > Hi, > > For GDAL 3.6.0 to 3.7.2, use of multi-threaded GTiff > compression+decompression, in particular within the same file, as for > example in a "gdalwarp -co COMPRESS=... -co NUM_THREADS=..." workflow > can lead to deadlocks (process stalled forever) and/or data corruption > (sometimes without errors at generation). Cf > https://github.com/OSGeo/gdal/issues/8470 for a reproducer. The fix is > in https://github.com/OSGeo/gdal/pull/8561 > > The issue is particularly visible on Windows, or more generally any > operating system (or file system where the output file is located) which > has no VSIVirtualHandle::PRead() implementation, but it can also be > occasionally reproduced on Linux (at least as a deadlock). > > Even > > -- > http://www.spatialys.com > My software is free, but my time generally not. > > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
Thanks Even for notifying this. I see the fix will be included in the next releases 3.7.3 and 3.8.0, both planned for November 1st (just in a couple of weeks) https://github.com/OSGeo/gdal/milestones On Mon, 16 Oct 2023 at 16:42, Even Rouault via gdal-dev < gdal-dev@lists.osgeo.org> wrote: > Hi, > > For GDAL 3.6.0 to 3.7.2, use of multi-threaded GTiff > compression+decompression, in particular within the same file, as for > example in a "gdalwarp -co COMPRESS=... -co NUM_THREADS=..." workflow > can lead to deadlocks (process stalled forever) and/or data corruption > (sometimes without errors at generation). Cf > https://github.com/OSGeo/gdal/issues/8470 for a reproducer. The fix is > in https://github.com/OSGeo/gdal/pull/8561 > > The issue is particularly visible on Windows, or more generally any > operating system (or file system where the output file is located) which > has no VSIVirtualHandle::PRead() implementation, but it can also be > occasionally reproduced on Linux (at least as a deadlock). > > Even > > -- > http://www.spatialys.com > My software is free, but my time generally not. > > ___ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev
[gdal-dev] Notice: issue about multi-threaded GTiff compression+decompression
Hi, For GDAL 3.6.0 to 3.7.2, use of multi-threaded GTiff compression+decompression, in particular within the same file, as for example in a "gdalwarp -co COMPRESS=... -co NUM_THREADS=..." workflow can lead to deadlocks (process stalled forever) and/or data corruption (sometimes without errors at generation). Cf https://github.com/OSGeo/gdal/issues/8470 for a reproducer. The fix is in https://github.com/OSGeo/gdal/pull/8561 The issue is particularly visible on Windows, or more generally any operating system (or file system where the output file is located) which has no VSIVirtualHandle::PRead() implementation, but it can also be occasionally reproduced on Linux (at least as a deadlock). Even -- http://www.spatialys.com My software is free, but my time generally not. ___ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev