Paul Eggert [EMAIL PROTECTED] wrote:
Jim Meyering [EMAIL PROTECTED] writes:
I'm probably going to change the documentation so that
people will be less likely to depend on being able to run
a separate program. To be precise, I'd like to document
that the only valid values of
Dan Hipschman [EMAIL PROTECTED] wrote:
On Wed, Jan 24, 2007 at 08:08:18AM +0100, Jim Meyering wrote:
I've checked in your changes, then changed NEWS a little:
Great! Thanks :-)
Additionally, I'm probably going to change the documentation so that
people will be less likely to depend on
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
According to Jim Meyering on 1/24/2007 12:08 AM:
Additionally, I'm probably going to change the documentation so that
people will be less likely to depend on being able to run a separate
program. To be precise, I'd like to document that the only
Jim Meyering [EMAIL PROTECTED] writes:
I'm probably going to change the documentation so that
people will be less likely to depend on being able to run
a separate program. To be precise, I'd like to document
that the only valid values of GNUSORT_COMPRESSOR are the
empty string, gzip and
Firstly, I wanted to say that I am exited by the extremely fast progress
that has been made in sort for compressed temporary files.
Many thanks to Dan and others for the implementation.
(I've failed to accomplish the bootstrap of the CVS sources - are there
bootstrapped snapshots
available
Dan Hipschman [EMAIL PROTECTED] wrote:
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
Not to look the gift horse in the mouth, but it'd be nice
if you wrote ChangeLog entries, too. And even (gasp! :-)
a test case or two. Of course, we'd expect such a test case
(probably named
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
Not to look the gift horse in the mouth, but it'd be nice
if you wrote ChangeLog entries, too. And even (gasp! :-)
a test case or two. Of course, we'd expect such a test case
(probably named tests/misc/sort-compress, and based on
Dan Hipschman [EMAIL PROTECTED] wrote:
I think this patch addresses everything Paul mentioned in his critique
of my last attempt. I did look at gnulib pipe module, but there were
some problems with using it out of the box. First, it takes a
Hi Dan,
Thanks for doing all that.
Not to look the
James Youngman [EMAIL PROTECTED] wrote:
It might be worth ensuring that we don't pass an invalid signal mask
to sigprocmask(SET_MASK,...) if the previous call to
sigprocmask(SIG_BLOCK,...) had failed. Offhand I can't think of a way
for sigprocmask() to fail unless the first argument is
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
Not to look the gift horse in the mouth, but it'd be nice
if you wrote ChangeLog entries, too. And even (gasp! :-)
a test case or two. Of course, we'd expect such a test case
(probably named tests/misc/sort-compress, and based on
Dan Hipschman [EMAIL PROTECTED] wrote:
On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
Not to look the gift horse in the mouth, but it'd be nice
if you wrote ChangeLog entries, too. And even (gasp! :-)
a test case or two. Of course, we'd expect such a test case
(probably
On Sun, Jan 21, 2007 at 10:41:11PM +0100, Jim Meyering wrote:
This is a good argument for using libz by default, not a separate
gzip program. Why incur the overhead of an exec when we don't need to?
Now, I'm convinced that sort should provide built-in support for both
gzip and bzip2. How to
I have a good feeling about this one :-) I think I've addressed
everything except adding libz. I just don't think I have the time to do
that right now, and I think it can wait. I can look into writing some
test cases in a bit.
2007-01-21 Jim Meyering [EMAIL PROTECTED]
* src/sort.c
Dan Hipschman wrote on 22-01-07 05:55:
sort can now compresses
Small typo here.
bjd
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils
Hi,
I think this patch addresses everything Paul mentioned in his critique
of my last attempt. I did look at gnulib pipe module, but there were
some problems with using it out of the box. First, it takes a
filename as the stdout of the child process, when for temp files it's
better to pass a
Jim Meyering [EMAIL PROTECTED] wrote:
Dan Hipschman [EMAIL PROTECTED] wrote:
On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
3. I can see where the user might be able to specify a better
algorithm, for a particular data set. For that, how about if we have
a
On Thu, 18 Jan 2007, Jim Meyering wrote:
I've done some more timings, but with two more sizes of input.
Here's the summary, comparing straight sort with sort --comp=gzip:
2.7GB: 6.6% speed-up
10.0GB: 17.8% speed-up
It would be interesting to see the individual stats returned by wait4(2)
Philip Rowlands [EMAIL PROTECTED] wrote:
On Thu, 18 Jan 2007, Jim Meyering wrote:
I've done some more timings, but with two more sizes of input.
Here's the summary, comparing straight sort with sort --comp=gzip:
2.7GB: 6.6% speed-up
10.0GB: 17.8% speed-up
It would be interesting to
On Thu, 18 Jan 2007, Jim Meyering wrote:
I had to use seq -f %.0f to get this filesize.
Odd.
Here's what those generate for me:
$ seq 999 k
$ wc -c k
7888
$ tail -1 k
999
What happens differently for you?
$ seq 990 999
9.9e+06
9.9e+06
9.9e+06
On Thu, Jan 18, 2007 at 05:47:53PM -0800, Dan Hipschman wrote:
That's a thought, although libz only works with gzip (as you said), and
it will add more complexity (like my original patch using LZO and this
new one combined). I don't think we'll have 40 instances of gzip -d
running. We should
Dan Hipschman [EMAIL PROTECTED] wrote:
Here's the patch for comments. Thanks,
I tried it and did some timings.
Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up,
but I think most of the savings is in reduced I/O.
virtually no
Jim Meyering [EMAIL PROTECTED] writes:
So, with just one trial each, I see a 19% speed-up.
Yaayyy! That's good news. Thanks for timing it. I read your email
just after talking with Dan (in person) about how we'd time it. I
just bought 1 TB worth of disk for my home computer and hadn't
Paul Eggert wrote on 16-01-07 18:35:
Jim Meyering [EMAIL PROTECTED] writes:
So, with just one trial each, I see a 19% speed-up.
Yaayyy! That's good news. Thanks for timing it. I read your email
just after talking with Dan (in person) about how we'd time it. I
just bought 1 TB worth of
On Tue, Jan 16, 2007 at 01:20:16PM +0100, Jim Meyering wrote:
I tried it and did some timings.
Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up,
but I think most of the savings is in reduced I/O.
Thanks very much for doing this. The performance gain is good news
indeed.
On 1/16/07, Dan Hipschman [EMAIL PROTECTED] wrote:
On Tue, Jan 16, 2007 at 01:20:16PM +0100, Jim Meyering wrote:
I tried it and did some timings.
Bottom line: with a 4+GB file, dual-processor, I see a 19% speed-up,
but I think most of the savings is in reduced I/O.
Thanks very much for doing
On Tue, Jan 16, 2007 at 09:35:52AM -0800, Paul Eggert wrote:
At this point to my mind the only question is how to put this change
in, and whether to make it the default (with gzip, say). Clearly it
can lead to a big performance improvement with large sorts on modern
machines.
I think it
On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
3. I can see where the user might be able to specify a better
algorithm, for a particular data set. For that, how about if we have
a --compress-program=PROGRAM option, which lets the user plug in any
program that works as a
On Sat, Jan 13, 2007 at 10:07:59PM -0800, Paul Eggert wrote:
3. I can see where the user might be able to specify a better
algorithm, for a particular data set. For that, how about if we have
a --compress-program=PROGRAM option, which lets the user plug in any
program that works as a
Thanks very much for looking into that. Some comments:
Dan Hipschman [EMAIL PROTECTED] writes:
+ /* This is so the child process won't delete our temp files
+ if it receives a signal before exec-ing. */
+ sigprocmask (SIG_BLOCK, caught_signals, oldset);
+ saved_temphead = temphead;
OK, here's my current patch and theory on why it's still not fast
enough. Here's profiling info on the cvs version of sort, up to
write_bytes:
% cumulative self self total
time seconds secondscalls s/call s/call name
38.09 7.85 7.85
Dan Hipschman [EMAIL PROTECTED] writes:
% cumulative self self total
time seconds secondscalls s/call s/call name
47.91 4.92 4.92 9269 0.00 0.00 find_temp
Yes, thanks, that explains the results all right
without the -S flag,
On some occasions, I have the need to sort extremely large files, but
which compress well using programs such as gzip or bzip.
I can emulate the sorting of a gzipped files while keeping input
compressed using shell pipes, eg
zcat in.gz | sort | gzip out.gz
However, if there is not enough
Craig Macdonald [EMAIL PROTECTED] wrote:
On some occasions, I have the need to sort extremely large files, but
which compress well using programs such as gzip or bzip.
...
This task has been on the TODO list for some time:
sort: Compress temporary files when doing large external sort/merges.
On Sat, Jan 13, 2007 at 10:36:05PM +0100, Jim Meyering wrote:
Craig Macdonald [EMAIL PROTECTED] wrote:
On some occasions, I have the need to sort extremely large files, but
which compress well using programs such as gzip or bzip.
...
This task has been on the TODO list for some time:
Thanks. I like the idea of compression, but before we get into the
details of your patch, what do you mean by there not being a
performance improvement with this patch? What's the holdup on
performance? It seems to me that compression ought to be a real win.
If it's not a win, we shouldn't
35 matches
Mail list logo