On 12/15/2010 11:45 PM, Gabor Szabo wrote:
> My only question here is why do you use the external sort and not the
> sort() function of Perl?

GNU sort have many more capabilities than perl's sort, including:
multithreaded sort (introduced, although a bit buggy, in version 8.6),
sorting huge files (bigger than available RAM),
many built-in sort options (version sort, human-numeric sort), etc.

> Was that only for the example replacing some other external program or
> is that the real thing. If so what is the advantage for you?

Actually, it's the real thing. It will be a wrapper script that accepts the 
same arguments as GNU sort,
but will support sorting a file that has a header line as the first line.
Very useful for our needs at the lab.

An attempt is being made to introduce this as a built-in feature in GNU sort, 
but it's not yet stable enough.
see here: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00078.html

My script will be available soon in github, if any one is interested. There's 
nothing ground-breaking in it, just a simple wrapper.

> ps. I would be also interested in the thought process. How did the
> scalar() suggestion lead you to the solution? If you still remember
> it.

I can only guess Tzadik was thinking about reading <STDIN> in scalar vs. list 
context (in list context perl will slurp all input), and the suggestion to use 
scalar(<STDIN>) should have somehow forced perl to read only one line.
While that didn't work, using to 'strace' was the quickest way to really verify 
what each process (perl, then sort) reads from file descriptor 0 - and indeed - 
perl's read() got all the input from the file descriptor.

Then, some reading about buffering (perlio's and libc's) revealed that any kind 
of buffering will not work, and the solution is to bypass all levels of 
buffering by using the kernel's read(2) system call, through perl's sysread .

Interestingly, before this was a perl script, it was a bash script.
in bash, the following just works:

=====
$ cat test.sh
#!/usr/bin/env bash
read LINE
sort
====

the shell's built-in "read" slurps just one line from STDIN, and passing it to 
sort works fine.


regards,
  -assaf
_______________________________________________
Perl mailing list
Perl@perl.org.il
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to