Re: A practical benchmark shows speed challenges for Perl 6
On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen wrote: > On 03/30/2016 03:45 AM, Timo Paulssen wrote: > > Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines > and see if that makes things any faster? > - Timo > > > Actually, the method on an IO::Handle is called "slurp-rest"; slurp would > only work with a filename instead. > - Timo Okay, I've done a comparison of the three methods on a 1 Gb file: IO.lines real 2m11.827s user 2m10.036s sys 0m1.468s IO.split real 1m51.504s user 1m51.136s sys 0m0.352s IO.slurp-rest real 2m9.821s user 2m6.268s sys 0m3.532s and Perl 5: real 0m4.614s user 0m4.328s sys 0m0.280s Best, -Tom
Re: A practical benchmark shows speed challenges for Perl 6
On 03/30/2016 04:11 PM, yary wrote: On Wed, Mar 30, 2016 at 3:20 PM, Elizabeth Mattijsen wrote: Thanks for your thoughts! I’ve implemented $*DEFAULT-READ-ELEMS in https://github.com/rakudo/rakudo/commit/5bd1e . Of course, all of this is provisional, and open for debate and bikeshedding. Yary, if you feel there's a need for this functionality in Perl *5* as well, please file a bug ticket via perlbug. Thank you very much. Jim Keenan
Re: A practical benchmark shows speed challenges for Perl 6
On Wed, Mar 30, 2016 at 3:20 PM, Elizabeth Mattijsen wrote: > Thanks for your thoughts! > > I’ve implemented $*DEFAULT-READ-ELEMS in > https://github.com/rakudo/rakudo/commit/5bd1e . > > Of course, all of this is provisional, and open for debate and bikeshedding. Thanks! And that was fast! Allowing DEFAULT-READ-ELEMS to be set from the environment's a good idea that I hadn't thought of- since it is a machine-dependent performance tweak, letting it be set outside the code is a good idea. I had originally envisioned this as an "option" to "sub open" for fine-grained control as to which IO::Handles got what DEFAULT-READ-ELEMS, but I'm not sure it belongs there. After all it is a performance-related tweak and I'm liking the idea of it being primarily set from the environment; setting it in the code means you're writing something for a particular host, don't need to change the spec to support that. Is there anything similar on the "write" side- output buffering- that could use this treatment? -y
Re: A practical benchmark shows speed challenges for Perl 6
> On 30 Mar 2016, at 16:06, yary wrote: > > Cross-posting to the compiler group- > > On Wed, Mar 30, 2016 at 8:10 AM, Elizabeth Mattijsen wrote: >> If you know the line endings of the file, using >> IO::Handle.split($line-ending) (note the actual character, rather than a >> regular expression) might help. That will read in the file in chunks of 64K >> and then lazily serve lines from that chunk. > > This reminds me of a pet peeve I had with p5: Inability to easily > change the default buffer size for reading & writing. > > I'm the lone Perl expert at $work and at one point was trying to keep > a file processing step in perl. These files were about 100x the size > of the server's RAM, consisted of variable-length newline-terminated > text, the processing was very light, there would be a few running in > parallel. The candidate language, C#, has a text-file-reading object > that lets you set its read-ahead buffer on creation/opening the file- > can't remember the details. That size had a large impact on the > performance of this task. With perl... I could not use the > not-so-well-documented IO::Handle->setvbuf because my OS didn't > support it. I did hack together something with sysread, but C# won in > the end due partly to that. > > It seems this "hiding-of-buffer" sub-optimal situation is being > repeated in Perl6: neither https://doc.perl6.org/routine/open nor > http://doc.perl6.org/type/IO::Handle mention a buffer, yet IO::Handle > reads ahead and buffers. Experience shows that being able to adjust > this buffer can help in certain situations. Also consider that perl5 > has defaulted to 4k and 8k, whereas perl6 is apparently using 64k, as > evidence that this buffer needs to change as system builds evolve. > > Please make this easily readable & settable, anywhere it's implemented! Thanks for your thoughts! I’ve implemented $*DEFAULT-READ-ELEMS in https://github.com/rakudo/rakudo/commit/5bd1e . Of course, all of this is provisional, and open for debate and bikeshedding. Liz
Re: A practical benchmark shows speed challenges for Perl 6
On 30/03/16 13:40, Tom Browder wrote: On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen wrote: On 03/30/2016 03:45 AM, Timo Paulssen wrote: Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines and see if that makes things any faster? ... Actually, the method on an IO::Handle is called "slurp-rest"; slurp would only work with a filename instead. - Timo Timo, I'm trying to test a situation where I could process every line as it is read in. The situation assumes the file is too large to slurp into memory, thus the read of one line at a time. So is there another way to do that? According to the docs "slurp-rest" gets all the remaining file at one read. Thanks, Best regards, -Tom I was suggesting this mostly because we've recently discovered a very severe performance problem with IO.lines. I'd like to know if that also affects your benchmark and how big the saving might be for "moderately" sized data. timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for "heap-snapshot".IO.lines {}' 129.14user 0.87system 2:10.44elapsed 99%CPU (0avgtext+0avgdata 507580maxresident)k timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for "heap-snapshot".IO.slurp.lines {}' 1.92user 0.14system 0:02.07elapsed 99%CPU (0avgtext+0avgdata 537940maxresident)k timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for "heap-snapshot".IO.open.split("\n") {}' 192.04user 0.36system 3:12.70elapsed 99%CPU (0avgtext+0avgdata 1350204maxresident)k Hope this clears up how I meant that :) - Timo
Re: A practical benchmark shows speed challenges for Perl 6
Cross-posting to the compiler group- On Wed, Mar 30, 2016 at 8:10 AM, Elizabeth Mattijsen wrote: > If you know the line endings of the file, using > IO::Handle.split($line-ending) (note the actual character, rather than a > regular expression) might help. That will read in the file in chunks of 64K > and then lazily serve lines from that chunk. This reminds me of a pet peeve I had with p5: Inability to easily change the default buffer size for reading & writing. I'm the lone Perl expert at $work and at one point was trying to keep a file processing step in perl. These files were about 100x the size of the server's RAM, consisted of variable-length newline-terminated text, the processing was very light, there would be a few running in parallel. The candidate language, C#, has a text-file-reading object that lets you set its read-ahead buffer on creation/opening the file- can't remember the details. That size had a large impact on the performance of this task. With perl... I could not use the not-so-well-documented IO::Handle->setvbuf because my OS didn't support it. I did hack together something with sysread, but C# won in the end due partly to that. It seems this "hiding-of-buffer" sub-optimal situation is being repeated in Perl6: neither https://doc.perl6.org/routine/open nor http://doc.perl6.org/type/IO::Handle mention a buffer, yet IO::Handle reads ahead and buffers. Experience shows that being able to adjust this buffer can help in certain situations. Also consider that perl5 has defaulted to 4k and 8k, whereas perl6 is apparently using 64k, as evidence that this buffer needs to change as system builds evolve. Please make this easily readable & settable, anywhere it's implemented! -y
Re: A practical benchmark shows speed challenges for Perl 6
> On 30 Mar 2016, at 13:40, Tom Browder wrote: > On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen wrote: >> On 03/30/2016 03:45 AM, Timo Paulssen wrote: >> >> Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines >> and see if that makes things any faster? > ... >> Actually, the method on an IO::Handle is called "slurp-rest"; slurp would >> only work with a filename instead. >> - Timo > Timo, I'm trying to test a situation where I could process every line > as it is read in. The situation assumes the file is too large to > slurp into memory, thus the read of one line at a time. So is there > another way to do that? According to the docs "slurp-rest" gets all > the remaining file at one read. That is correct. The thing is that IO.lines basically depends on IO.get to get a line. So that is extra overhead, that IO.slurp.lines doesn’t have. If you know the line endings of the file, using IO::Handle.split($line-ending) (note the actual character, rather than a regular expression) might help. That will read in the file in chunks of 64K and then lazily serve lines from that chunk. A simple test on an /etc/dict/words: $ 6 '"words".IO.lines.elems.say' 235886 real0m0.645s $ 6 '"words".IO.open.split("\x0a").elems.say' 235887 real0m0.317s Note that with .split you will get an extra empty line at the end. Hope this helps. Liz
Re: A practical benchmark shows speed challenges for Perl 6
On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen wrote: > On 03/30/2016 03:45 AM, Timo Paulssen wrote: > > Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines > and see if that makes things any faster? ... > Actually, the method on an IO::Handle is called "slurp-rest"; slurp would > only work with a filename instead. > - Timo Timo, I'm trying to test a situation where I could process every line as it is read in. The situation assumes the file is too large to slurp into memory, thus the read of one line at a time. So is there another way to do that? According to the docs "slurp-rest" gets all the remaining file at one read. Thanks, Best regards, -Tom