[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2021-10-30 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: A small update / summary so far: >From here this developed into coreutils discussion: #29921 O(n^2) performance of rm -r https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29921 and finally a `linux-fsdevel` discussion: O(n^2) deletion performance

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2018-06-12 Thread Giampaolo Rodola'
Change by Giampaolo Rodola' : -- nosy: +giampaolo.rodola ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2018-01-02 Thread Antoine Pitrou
Antoine Pitrou added the comment: Yes, so `rm -rf` is quadratic on my SSD too... -- ___ Python tracker ___

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2018-01-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thanks for your investigations Niklas. I ran my benchmark on a spinning disk, but significant nonlinearity is observed only for the size of directory around 100 and more. And yes, sorting by inode number helps. $ for j in 1

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2018-01-01 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: A better location to view the whole coreutils thread is: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29921 -- ___ Python tracker

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-31 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: OK, my coreutils email is at http://lists.gnu.org/archive/html/bug-coreutils/2017-12/msg00054.html -- ___ Python tracker

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-31 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: > Did you try to sync and flush caches before running `rm -r`? Yes, it doesn't make a difference for me, I still see the same O(n²) behaviour in `rm -r`. I've sent an email "O(n^2) performance of rm -r" to bug-coreut...@gnu.org just now,

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-31 Thread Antoine Pitrou
Antoine Pitrou added the comment: Did you try to sync and flush caches before running `rm -r`? -- ___ Python tracker ___

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-31 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: It turns out I was wrong when saying that there's some cache we're hitting. In fact, `rm -r` is still precisely O(n^2), even with the coreutils patch I linked. Quick overview table of the benchmark: nfiles real user sys

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-31 Thread Niklas Hambüchen
Niklas Hambüchen added the comment: Serhiy, did you run your benchmark on an SSD or a spinning disk? The coreutils bug mentions that the problem is seek times. My tests on a spinning disk with 400k files suggest that indeed rmtree() is ~30x slower than `rm -r`: # time

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-30 Thread Serhiy Storchaka
Change by Serhiy Storchaka : Added file: https://bugs.python.org/file47355/bench_rmtree.py ___ Python tracker ___

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I have tested shutil.rmtree() with a large number of files using modified benchmark from issue28564. For 40 files it takes less than 5 seconds. From the comment to the coreutils benchmark

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-30 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't think filesystem-specific optimizations belong in the Python stdlib. Python is compatible with multiple operating systems, including Windows, macOS, Android, many POSIX variants. It would be much better if this were fixed in the

[issue32453] shutil.rmtree can have O(n^2) performance on large dirs

2017-12-29 Thread Niklas Hambüchen
New submission from Niklas Hambüchen : See http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a for the explanation and equivalent fix in coreutils. The gist ist that deleting entries in inode order can improve deletion performance dramatically. To obtain