Re: feature request: gzip/bzip support for sort

2007-01-25 Thread Jim Meyering
Paul Eggert [EMAIL PROTECTED] wrote: Jim Meyering [EMAIL PROTECTED] writes: I'm probably going to change the documentation so that people will be less likely to depend on being able to run a separate program. To be precise, I'd like to document that the only valid values of

Re: feature request: gzip/bzip support for sort

2007-01-25 Thread Jim Meyering
Dan Hipschman [EMAIL PROTECTED] wrote: On Wed, Jan 24, 2007 at 08:08:18AM +0100, Jim Meyering wrote: I've checked in your changes, then changed NEWS a little: Great! Thanks :-) Additionally, I'm probably going to change the documentation so that people will be less likely to depend on

Re: feature request: gzip/bzip support for sort

2007-01-24 Thread Eric Blake
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Jim Meyering on 1/24/2007 12:08 AM: Additionally, I'm probably going to change the documentation so that people will be less likely to depend on being able to run a separate program. To be precise, I'd like to document that the only

Re: feature request: gzip/bzip support for sort

2007-01-24 Thread Paul Eggert
Jim Meyering [EMAIL PROTECTED] writes: I'm probably going to change the documentation so that people will be less likely to depend on being able to run a separate program. To be precise, I'd like to document that the only valid values of GNUSORT_COMPRESSOR are the empty string, gzip and

Re: feature request: gzip/bzip support for sort

2007-01-24 Thread Craig Macdonald
Firstly, I wanted to say that I am exited by the extremely fast progress that has been made in sort for compressed temporary files. Many thanks to Dan and others for the implementation. (I've failed to accomplish the bootstrap of the CVS sources - are there bootstrapped snapshots available

Re: feature request: gzip/bzip support for sort

2007-01-23 Thread Jim Meyering
Dan Hipschman [EMAIL PROTECTED] wrote: On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: Not to look the gift horse in the mouth, but it'd be nice if you wrote ChangeLog entries, too. And even (gasp! :-) a test case or two. Of course, we'd expect such a test case (probably named

Re: feature request: gzip/bzip support for sort

2007-01-22 Thread Dan Hipschman
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: Not to look the gift horse in the mouth, but it'd be nice if you wrote ChangeLog entries, too. And even (gasp! :-) a test case or two. Of course, we'd expect such a test case (probably named tests/misc/sort-compress, and based on

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Jim Meyering
Dan Hipschman [EMAIL PROTECTED] wrote: I think this patch addresses everything Paul mentioned in his critique of my last attempt. I did look at gnulib pipe module, but there were some problems with using it out of the box. First, it takes a Hi Dan, Thanks for doing all that. Not to look the

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Jim Meyering
James Youngman [EMAIL PROTECTED] wrote: It might be worth ensuring that we don't pass an invalid signal mask to sigprocmask(SET_MASK,...) if the previous call to sigprocmask(SIG_BLOCK,...) had failed. Offhand I can't think of a way for sigprocmask() to fail unless the first argument is

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Dan Hipschman
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: Not to look the gift horse in the mouth, but it'd be nice if you wrote ChangeLog entries, too. And even (gasp! :-) a test case or two. Of course, we'd expect such a test case (probably named tests/misc/sort-compress, and based on

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Jim Meyering
Dan Hipschman [EMAIL PROTECTED] wrote: On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote: Not to look the gift horse in the mouth, but it'd be nice if you wrote ChangeLog entries, too. And even (gasp! :-) a test case or two. Of course, we'd expect such a test case (probably

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Dan Hipschman
On Sun, Jan 21, 2007 at 10:41:11PM +0100, Jim Meyering wrote: This is a good argument for using libz by default, not a separate gzip program. Why incur the overhead of an exec when we don't need to? Now, I'm convinced that sort should provide built-in support for both gzip and bzip2. How to

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Dan Hipschman
I have a good feeling about this one :-) I think I've addressed everything except adding libz. I just don't think I have the time to do that right now, and I think it can wait. I can look into writing some test cases in a bit. 2007-01-21 Jim Meyering [EMAIL PROTECTED] * src/sort.c

Re: feature request: gzip/bzip support for sort

2007-01-21 Thread Bauke Jan Douma
Dan Hipschman wrote on 22-01-07 05:55: sort can now compresses Small typo here. bjd ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: feature request: gzip/bzip support for sort

2007-01-19 Thread Dan Hipschman
Hi, I think this patch addresses everything Paul mentioned in his critique of my last attempt. I did look at gnulib pipe module, but there were some problems with using it out of the box. First, it takes a filename as the stdout of the child process, when for temp files it's better to pass a

Re: feature request: gzip/bzip support for sort

2007-01-18 Thread Jim Meyering
Jim Meyering [EMAIL PROTECTED] wrote: Dan Hipschman [EMAIL PROTECTED] wrote: On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote: 3. I can see where the user might be able to specify a better algorithm, for a particular data set. For that, how about if we have a

Re: feature request: gzip/bzip support for sort

2007-01-18 Thread Philip Rowlands
On Thu, 18 Jan 2007, Jim Meyering wrote: I've done some more timings, but with two more sizes of input. Here's the summary, comparing straight sort with sort --comp=gzip: 2.7GB: 6.6% speed-up 10.0GB: 17.8% speed-up It would be interesting to see the individual stats returned by wait4(2)

Re: feature request: gzip/bzip support for sort

2007-01-18 Thread Jim Meyering
Philip Rowlands [EMAIL PROTECTED] wrote: On Thu, 18 Jan 2007, Jim Meyering wrote: I've done some more timings, but with two more sizes of input. Here's the summary, comparing straight sort with sort --comp=gzip: 2.7GB: 6.6% speed-up 10.0GB: 17.8% speed-up It would be interesting to

Re: feature request: gzip/bzip support for sort

2007-01-18 Thread Philip Rowlands
On Thu, 18 Jan 2007, Jim Meyering wrote: I had to use seq -f %.0f to get this filesize. Odd. Here's what those generate for me: $ seq 999 k $ wc -c k 7888 $ tail -1 k 999 What happens differently for you? $ seq 990 999 9.9e+06 9.9e+06 9.9e+06

Re: feature request: gzip/bzip support for sort

2007-01-18 Thread Dan Hipschman
On Thu, Jan 18, 2007 at 05:47:53PM -0800, Dan Hipschman wrote: That's a thought, although libz only works with gzip (as you said), and it will add more complexity (like my original patch using LZO and this new one combined). I don't think we'll have 40 instances of gzip -d running. We should

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread Jim Meyering
Dan Hipschman [EMAIL PROTECTED] wrote: Here's the patch for comments. Thanks, I tried it and did some timings. Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up, but I think most of the savings is in reduced I/O. virtually no

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread Paul Eggert
Jim Meyering [EMAIL PROTECTED] writes: So, with just one trial each, I see a 19% speed-up. Yaayyy! That's good news. Thanks for timing it. I read your email just after talking with Dan (in person) about how we'd time it. I just bought 1 TB worth of disk for my home computer and hadn't

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread Bauke Jan Douma
Paul Eggert wrote on 16-01-07 18:35: Jim Meyering [EMAIL PROTECTED] writes: So, with just one trial each, I see a 19% speed-up. Yaayyy! That's good news. Thanks for timing it. I read your email just after talking with Dan (in person) about how we'd time it. I just bought 1 TB worth of

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread Dan Hipschman
On Tue, Jan 16, 2007 at 01:20:16PM +0100, Jim Meyering wrote: I tried it and did some timings. Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up, but I think most of the savings is in reduced I/O. Thanks very much for doing this. The performance gain is good news indeed.

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread James Youngman
On 1/16/07, Dan Hipschman [EMAIL PROTECTED] wrote: On Tue, Jan 16, 2007 at 01:20:16PM +0100, Jim Meyering wrote: I tried it and did some timings. Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up, but I think most of the savings is in reduced I/O. Thanks very much for doing

Re: feature request: gzip/bzip support for sort

2007-01-16 Thread Dan Hipschman
On Tue, Jan 16, 2007 at 09:35:52AM -0800, Paul Eggert wrote: At this point to my mind the only question is how to put this change in, and whether to make it the default (with gzip, say). Clearly it can lead to a big performance improvement with large sorts on modern machines. I think it

Re: feature request: gzip/bzip support for sort

2007-01-15 Thread Dan Hipschman
On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote: 3. I can see where the user might be able to specify a better algorithm, for a particular data set. For that, how about if we have a --compress-program=PROGRAM option, which lets the user plug in any program that works as a

Re: feature request: gzip/bzip support for sort

2007-01-15 Thread Dan Hipschman
On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote: 3. I can see where the user might be able to specify a better algorithm, for a particular data set. For that, how about if we have a --compress-program=PROGRAM option, which lets the user plug in any program that works as a

Re: feature request: gzip/bzip support for sort

2007-01-15 Thread Paul Eggert
Thanks very much for looking into that. Some comments: Dan Hipschman [EMAIL PROTECTED] writes: + /* This is so the child process won't delete our temp files + if it receives a signal before exec-ing. */ + sigprocmask (SIG_BLOCK, caught_signals, oldset); + saved_temphead = temphead;

Re: feature request: gzip/bzip support for sort

2007-01-14 Thread Dan Hipschman
OK, here's my current patch and theory on why it's still not fast enough. Here's profiling info on the cvs version of sort, up to write_bytes: % cumulative self self total time seconds secondscalls s/call s/call name 38.09 7.85 7.85

Re: feature request: gzip/bzip support for sort

2007-01-14 Thread Paul Eggert
Dan Hipschman [EMAIL PROTECTED] writes: % cumulative self self total time seconds secondscalls s/call s/call name 47.91 4.92 4.92 9269 0.00 0.00 find_temp Yes, thanks, that explains the results all right without the -S flag,

feature request: gzip/bzip support for sort

2007-01-13 Thread Craig Macdonald
On some occasions, I have the need to sort extremely large files, but which compress well using programs such as gzip or bzip. I can emulate the sorting of a gzipped files while keeping input compressed using shell pipes, eg zcat in.gz | sort | gzip out.gz However, if there is not enough

Re: feature request: gzip/bzip support for sort

2007-01-13 Thread Jim Meyering
Craig Macdonald [EMAIL PROTECTED] wrote: On some occasions, I have the need to sort extremely large files, but which compress well using programs such as gzip or bzip. ... This task has been on the TODO list for some time: sort: Compress temporary files when doing large external sort/merges.

Re: feature request: gzip/bzip support for sort

2007-01-13 Thread Dan Hipschman
On Sat, Jan 13, 2007 at 10:36:05PM +0100, Jim Meyering wrote: Craig Macdonald [EMAIL PROTECTED] wrote: On some occasions, I have the need to sort extremely large files, but which compress well using programs such as gzip or bzip. ... This task has been on the TODO list for some time:

Re: feature request: gzip/bzip support for sort

2007-01-13 Thread Paul Eggert
Thanks. I like the idea of compression, but before we get into the details of your patch, what do you mean by there not being a performance improvement with this patch? What's the holdup on performance? It seems to me that compression ought to be a real win. If it's not a win, we shouldn't