Re: Sort: buffer-size bug and nmerge performance issue

2007-03-07 Thread Paul Eggert
William Herrin [EMAIL PROTECTED] writes:

 1. sort running on Linux kernel 2.6 x86_64 will accept but mishandle a
 --buffer-size argument larger than 2048M. It will silently pick some size
 noticeably smaller than 2048M. This is on a 64-bit machine with no ulimits
 on the amount of memory a process can consume.

First, you need to compile coreutils in 64-bit mode to be able to use
64-bit sizes.  (The default is to compile in 32-bit mode.)  I assume
you've done that?  If not, please try that.

Second, 'sort' is at the mercy of your malloc implementation.  That
is, 'sort' tries to malloc 2048M, and if that fails, it tries a
smaller allocation.  The idea is that it's better to get to sort done
than to complain about running out of memory.  Perhaps your C library
doesn't let 'malloc' succeed with large values?  You can test this by
writing a little test program.

 Has anyone done any testing to see if 16 is an optimal number for the merge
 size?

It depends on the platform.  It might well make sense to revisit that
number, or even make it a user option with a more-reasonable default.

Have you tried using compression on temporaries (the new
--compress-program option in coreutils 6.8)?  That probably affects
the optimal number.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Sort: buffer-size bug and nmerge performance issue

2007-03-06 Thread William Herrin

Hi folks:

One bug and one performance issue for the sort command in Coreutils 6.7.
This also applies to Coreutils 5.94. Results were observed with a freshly
compiled copy of sort on an x86_64 machine running Red Hat Enterprise Linux
AS release 4, kernel 2.6.9.

1. sort running on Linux kernel 2.6 x86_64 will accept but mishandle a
--buffer-size argument larger than 2048M. It will silently pick some size
noticeably smaller than 2048M. This is on a 64-bit machine with no ulimits
on the amount of memory a process can consume.

2. In sort, NMERGE is set to 16. When sort runs out space in its buffer, it
stores sorted pieces in /tmp and then performs a merge on the pieces. If
there are more than 16, it has to repeat this process, merging sets of 16
into larger files, then mergeing those larger files again, etc.

Has anyone done any testing to see if 16 is an optimal number for the merge
size? My own ad-hoc tests suggest than a setting closer to 128 or 256 allows
sort to finish much sooner and consume fewer resources on the machine.

Regards,
Bill Herrin


--
William D. Herrin  [EMAIL PROTECTED]   [EMAIL PROTECTED]
3005 Crane Dr.Web:  http://bill.herrin.us/
Falls Church, VA 22042-3004
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils