bug#7182: sort -R slow
Davide Brini wrote: On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange ta...@gnu.org wrote: I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: $ time unsort file real0m1.388s $ unsort --version unsort 1.1.2 $ time perl -e 'print sort { rand() = rand() } ' file real0m6.621s $ time sort -R file real4m8.403s $ sort --version sort (GNU coreutils) 8.5 What is even scarier: sort without -R is faster than sort -R: $ time sort file real0m53.553s I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. On my system, locale settings seem to impact the runtime significantly: $ wc -l bigfile 100 bigfile $ time LC_ALL=en_US.utf8 sort -R bigfile /dev/null real 1m29.302s user 1m21.009s sys 0m0.155s $ time LC_ALL=C sort -R bigfile /dev/null real 0m38.881s user 0m35.276s sys 0m0.118s However, shuf is much faster, and seems mostly unaffected by the locale used: $ time shuf bigfile /dev/null real 0m1.044s user 0m0.833s sys 0m0.042s Thanks for the report. I think the performance of sort -R will often be worse than that of shuf (by design, since it accesses each byte of each line once more, to compute the hash), except when the input size is larger than available memory. The info documentation for sort -R does refer to shuf. Any suggestions for improvements are welcome. I'm closing this. You're welcome to reopen or file a new report.
bug#7182: sort -R slow
I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: $ time unsort file real0m1.388s $ unsort --version unsort 1.1.2 $ time perl -e 'print sort { rand() = rand() } ' file real0m6.621s $ time sort -R file real4m8.403s $ sort --version sort (GNU coreutils) 8.5 What is even scarier: sort without -R is faster than sort -R: $ time sort file real0m53.553s I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. /Ole
bug#7182: sort -R slow
Ole Tange writes: I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: Never heard of unsort. Why didn't you try shuf(1)? Also, your perl is not valid: $ time perl -e 'print sort { rand() = rand() } ' file real0m6.621s That comparison function is not consistent (unless very lucky). I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. How big is your test file? I expect sort(1) to be optimized for big jobs. I bet it would win the contest if you are shuffling a file that's bigger than available RAM.
bug#7182: sort -R slow
On Sat, 9 Oct 2010 14:52:41 +0200 Ole Tange ta...@gnu.org wrote: I recently needed to randomize some lines. So I tried using 'sort -R'. I was astonished how slow that was. So I tested how slow a competing strategies are. GNU sort is two magnitudes slower than unsort and more than one magnitude slower than perl: $ time unsort file real0m1.388s $ unsort --version unsort 1.1.2 $ time perl -e 'print sort { rand() = rand() } ' file real0m6.621s $ time sort -R file real4m8.403s $ sort --version sort (GNU coreutils) 8.5 What is even scarier: sort without -R is faster than sort -R: $ time sort file real0m53.553s I would expect sort -R to be faster than sort and faster than Perl if not as fast as unsort. On my system, locale settings seem to impact the runtime significantly: $ wc -l bigfile 100 bigfile $ time LC_ALL=en_US.utf8 sort -R bigfile /dev/null real1m29.302s user1m21.009s sys 0m0.155s $ time LC_ALL=C sort -R bigfile /dev/null real0m38.881s user0m35.276s sys 0m0.118s However, shuf is much faster, and seems mostly unaffected by the locale used: $ time shuf bigfile /dev/null real0m1.044s user0m0.833s sys 0m0.042s -- D.