Re: A practical benchmark shows speed challenges for Perl 6

2016-04-02 Thread yary
I think I understand the objections to the proposed environment variable
now. I'm branching Rakudo, and attempting my first ever patch, in order to
make handle-specific buffer size possible. Leaving in the env var, as that
still has possible use...
On Apr 2, 2016 8:47 AM, "Tom Browder"  wrote:

> On Thursday, March 31, 2016, Elizabeth Mattijsen  wrote:
> > > On 31 Mar 2016, at 14:12, Tom Browder  wrote:
> ...
> >
> > but I'll put the code on my github account later today. I'm sure it can
> be tweaked,
> > but the gross differences between Perl 6 and Perl 5 doing a very common
> task
> > I think should be seriously investigated by someone familiar with MoarVM
> internals.
>
> Done:
>
>  https://github.com/tbrowder/perl6-read-write-tests
>
> > I think jnthn already as synchronous printing (not using libuv) on the
> list of things to do.
>
> I do see that on the list at moarvm.org.
>
> I wonder if it would be worth the effort to try to make a module which
> uses the native call capability to use libc's readline and see if
> there is any significant speed up?
>
> Best,
>
> -Tom
>


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-02 Thread Tom Browder
On Thursday, March 31, 2016, Elizabeth Mattijsen  wrote:
> > On 31 Mar 2016, at 14:12, Tom Browder  wrote:
...
>
> but I'll put the code on my github account later today. I'm sure it can be 
> tweaked,
> but the gross differences between Perl 6 and Perl 5 doing a very common task
> I think should be seriously investigated by someone familiar with MoarVM 
> internals.

Done:

 https://github.com/tbrowder/perl6-read-write-tests

> I think jnthn already as synchronous printing (not using libuv) on the list 
> of things to do.

I do see that on the list at moarvm.org.

I wonder if it would be worth the effort to try to make a module which
uses the native call capability to use libc's readline and see if
there is any significant speed up?

Best,

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread yary
Not sure I understand the disagreement.

"the correct buffer size is often per file, not per
program/invocation, so a one-size-fits-all envar is the wrong
approach"- if you're saying "it would be great to have the buffer size
be an option to 'open'," then I agree. It would be nice to have that
settable as a parameter. At the moment, you can set it by file by
changing $*DEFAULT-READ-ELEMS then creating the filehandle- I read the
source, and the handle saves the value it was given. If you change the
variable later and open a new handle, the new handle gets the new
value while the old keeps its old value. While being able to
explicitly set it per handle would be good, the implementation of this
dynamic variable is a workable first step.

" it is rare for the buffer size to be different based on the system
except on systems so small that rakudo isn't going to work on them
anyway (e.g. embedded systems)."

That doesn't make sense to me, before Elizabeth's patch the buffer was
never different on any system, it was always 64K. (And I thought I saw
Rakudo on Raspberry Pi in 2014 or '15 but not sure...)

'I can't see this impacting much in the common case" right this isn't
addressing common cases. it's an "infrequently asked question."

I wasn't even thinking of small systems, but large ones with SAN
networks using clever caching over not-quite-properly configured
VLANs- like what I use at work. Some of our servers use the SAN
through a different VLAN. I don't expect a programmer to fine-tune
file-processing code for my situation (unless they want to on-the-fly
find the best buffer size each run!), I do appreciate being able to
set the buffer in the environment per-system, and I was the original
person asking to be able to set the buffer and I have seen it make a
difference in this use case.

"In the specific case that prompted this thread, it is the programmer
that wants to specify a very large buffer." In my case, I was
comparing perl5 with C#, and I found that for that particular system,
code, and file, a buffer size of 128k was the sweet spot.

"is a small system even going to be able to handle that much data to
begin with?" Not sure how the existence of small systems eliminates
the usefulness of twiddling buffer sizes.

-y


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread Brandon Allbery
On Fri, Apr 1, 2016 at 11:09 AM, yary  wrote:

> Setting the buffer size is better done by the user, not the
> programmer. Often the user and the programmer are one and the same, in
> which case, the programmer knows the environment and can set the
> environment variables- or change the code- whichever makes better
> sense.


I would disagree with this; having been in both of those seats,

(a) the correct buffer size is often per file, not per program/invocation,
so a one-size-fits-all envar is the wrong approach;

(b) it is rare for the buffer size to be different based on the system
except on systems so small that rakudo isn't going to work on them anyway
(e.g. embedded systems).

64k is a rather large buffer size relative to libc stdio which is usually
4-8k, but rather small compared to many other aspects of rakudo's memory
usage. I can't see this impacting much in the common case. In the specific
case that prompted this thread, it is the programmer that wants to specify
a very large buffer. And the fact that a very large buffer is wanted is
actually a symptom of an even more significant memory issue: is a small
system even going to be able to handle that much data to begin with? So
again, the *buffer size* is not the important part of the equation and
trying to tune it to reduce the impact on the system is attacking the wrong
part of the problem.

-- 
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread yary
Actually I would characterize it as

Before:

The programmer had no control over the buffer size, and the user of
the code had no way of adjusting the buffer to a particular system.

Currently:

The programmer has control over the buffer size, and the user of the
code can adjust the buffer to a particular system.


Setting the buffer size is better done by the user, not the
programmer. Often the user and the programmer are one and the same, in
which case, the programmer knows the environment and can set the
environment variables- or change the code- whichever makes better
sense.

If you're writing code for others to use, then the optimal buffer size
isn't known by you. The programmer can either leave it alone and let
the user set an environment variable; or can hard-code it to some
fixed value- for example a multiple of a fixed record size- or can
have code scale the dynamic variable (which is either the default or
set via the user's environment)- for example because the code forks N
times and you've noticed it performs better with a buffer 1/Nth the
usual size (purely hypothetical)

When I hear "Every program that was made under the previous paradigm
now needs to be modified to check the environment to avoid undesired
side effects" what I think is "no, every program that cares can say
INIT $*DEFAULT-READ-ELEMS=65336 thus ignoring the environment. But if
someone gave me a module or program that ignored my wishes, I'd edit
it away."


There are times when you want to ignore the environment - like in Perl
5's taint mode, which if I recall correctly, clears $ENV{PATH} and a
few other things. But in general code uses bits of the environment
because the user wants it that way. If the user is fiddling with
buffer size, then the user knows something or id debugging something
about the system which the programmer didn't need to think about.


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread Jan Ingvoldstad
On Fri, Apr 1, 2016 at 2:00 PM, Elizabeth Mattijsen  wrote:

> Sorry if I wasn’t clear: If there is no dynamic var, it will make one:
> either from the environment, or set it to 64K (like it was before).  So no
> programmer action is ever needed if they’re not interested in that type of
> optimization.
>

That was abundantly clear.


> At the moment it is nothing but a balloon that I let go up in the air.
> Question is still out on whether this will continue to live until the next
> release, or that it will be replaced by something more low level, at the VM
> level.
>
> If you put garbage in the environment, it will die trying to coerce that
> to an integer.
>

Sorry for bringing that up, as it evidently confused the issue.

I'll try to explain the problem once again, with feeling ;) – hoping that
I'm being clearer this time.

Before:

The programmer knows that the buffer size is 64K unless the programmer asks
for something different. A typical Perl program reading buffered input does
not need to worry about anything, unless the programmer wants to have
smaller or larger buffers.

In other words: fire and forget.

Currently:

The programmer does not know what the buffer size is, as it can either be
the default, or set by an environment. Every program that was made under
the previous paradigm now needs to be modified to check the environment to
avoid undesired side effects.

Every future program also needs to include code that checks the environment
to avoid undesired side effects.

-- 
Jan


Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread Elizabeth Mattijsen
> On 01 Apr 2016, at 13:50, Jan Ingvoldstad  wrote:
> 
> On Thu, Mar 31, 2016 at 10:36 AM, Elizabeth Mattijsen  wrote:
> > The reasoning behind _not_ setting things via environment variables, is 
> > that this means the programmer now needs to worry what e.g. the webserver 
> > running the Perl program does, and there are unknown stability (and 
> > possibly security) implications. This adds bloat to the program.
> >
> > The programmer is better off if they only explicitly need to worry about it 
> > when they want to change the defaults.
> 
> The environment variable is only used if there is no dynamic variable found.  
> So, if a programmer wishes to use a specific buffer size in the program, they 
> can.
>  
> This is precisely _not_ addressing the issue I raised.
> 
> This way, the programmer _needs_ to explicitly check whether the environment 
> variable is set, and if not, somehow set a sensible default if the 
> environment variable differs from the default.
> 
> That adds quite a bit of unnecessary programming to each Perl program that 
> deals with buffers.
> The status as it was before, was that the programmer didn't need to worry 
> about the environment for buffer size.

Sorry if I wasn’t clear: If there is no dynamic var, it will make one: either 
from the environment, or set it to 64K (like it was before).  So no programmer 
action is ever needed if they’re not interested in that type of optimization.


> If a malicious environment sets the buffer size to something undesirable, 
> there may be side effects that are hard to predict, and may have other 
> implications than merely performance.
> 
> I think it is preferable that the decision about that is made by the 
> programmer rather than the environment.
> 
> PS: I'm assuming that $*DEFAULT-READ-ELEMS is clean by the time it reaches 
> any code, that is that it only contains _valid_ integer values and cannot 
> lead to overflows or anything, I am not concerned about that.

At the moment it is nothing but a balloon that I let go up in the air.  
Question is still out on whether this will continue to live until the next 
release, or that it will be replaced by something more low level, at the VM 
level.

If you put garbage in the environment, it will die trying to coerce that to an 
integer.


Liz



Re: A practical benchmark shows speed challenges for Perl 6

2016-04-01 Thread Jan Ingvoldstad
On Thu, Mar 31, 2016 at 10:36 AM, Elizabeth Mattijsen 
wrote:

> > The reasoning behind _not_ setting things via environment variables, is
> that this means the programmer now needs to worry what e.g. the webserver
> running the Perl program does, and there are unknown stability (and
> possibly security) implications. This adds bloat to the program.
> >
> > The programmer is better off if they only explicitly need to worry about
> it when they want to change the defaults.
>
> The environment variable is only used if there is no dynamic variable
> found.  So, if a programmer wishes to use a specific buffer size in the
> program, they can.


This is precisely _not_ addressing the issue I raised.

This way, the programmer _needs_ to explicitly check whether the
environment variable is set, and if not, somehow set a sensible default if
the environment variable differs from the default.

That adds quite a bit of unnecessary programming to each Perl program that
deals with buffers.

The status as it was before, was that the programmer didn't need to worry
about the environment for buffer size.

If a malicious environment sets the buffer size to something undesirable,
there may be side effects that are hard to predict, and may have other
implications than merely performance.

I think it is preferable that the decision about that is made by the
programmer rather than the environment.

PS: I'm assuming that $*DEFAULT-READ-ELEMS is clean by the time it reaches
any code, that is that it only contains _valid_ integer values and cannot
lead to overflows or anything, I am not concerned about that.
-- 
Jan


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-31 Thread Tom Browder
On Thursday, March 31, 2016, Elizabeth Mattijsen  wrote:
...

> Perhaps you could do a —profile on a case that runs about 5 seconds or so,
> to get an idea of the bottleneck?
>
> Or could you gist the code that does the actual processing??


Okay, Liz, I'll work on the instructions found here and see what I can find:

  https://doc.perl6.org/language/performance

Then I'll "gist" it if I find something "gistable."

Thanks.

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-31 Thread Fields, Christopher J


On Mar 31, 2016, at 8:56 AM, Tom Browder 
> wrote:

On Thursday, March 31, 2016, Elizabeth Mattijsen 
> wrote:
> On 31 Mar 2016, at 14:12, Tom Browder > 
> wrote:
> Liz, it's a simple reader of a text file.  The only line processing is a 
> print of the number of characters of each line.  I guess I should eliminate 
> that but I assumed that was neglible since all the reader scripts do the same.

I wonder.  I wouldn’t be surprised if not printing number of chars would make a 
significant difference.

I'll try turning any line processing off, but note the Perl 5 reader does the 
same thing.

Best,

-Tom

Just to note, I have also seen dramatic differences in IO between Perl 5 and 
Perl 6, simply iterating through large files (in this case DNA sequence data) 
by line w/ no processing.

Chris


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-31 Thread Tom Browder
On Thursday, March 31, 2016, Elizabeth Mattijsen  wrote:

> > On 31 Mar 2016, at 14:12, Tom Browder  > wrote:
> > Liz, it's a simple reader of a text file.  The only line processing is a
> print of the number of characters of each line.  I guess I should eliminate
> that but I assumed that was neglible since all the reader scripts do the
> same.
>
> I wonder.  I wouldn’t be surprised if not printing number of chars would
> make a significant difference.


I'll try turning any line processing off, but note the Perl 5 reader does
the same thing.

Best,

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-31 Thread Elizabeth Mattijsen
> On 31 Mar 2016, at 09:50, Jan Ingvoldstad  wrote:
> 
> On Wed, Mar 30, 2016 at 9:20 PM, Elizabeth Mattijsen  wrote:
> 
> Thanks for your thoughts!
> 
> I’ve implemented $*DEFAULT-READ-ELEMS in 
> https://github.com/rakudo/rakudo/commit/5bd1e .
> 
> Of course, all of this is provisional, and open for debate and bikeshedding.
> 
> 
> Brilliant and brilliantly quick response, Liz!
> 
> In the spirit of bikeshedding, my first thought is that the variable name 
> should have something with BUFFER in it, as that is what it appears to be. :)
> 
> Functionally, it's nice to be able to set it via the environment, but since 
> the environment may not necessarily be controlled by the programmer, I 
> consider that to be a short term solution.
> 
> A longer term solution would be for a way to set it within the program that 
> the environment cannot override.
> 
> Additionally, there could also be a default, compile-time option for Rakudo.
> 
> 
> The reasoning behind _not_ setting things via environment variables, is that 
> this means the programmer now needs to worry what e.g. the webserver running 
> the Perl program does, and there are unknown stability (and possibly 
> security) implications. This adds bloat to the program.
> 
> The programmer is better off if they only explicitly need to worry about it 
> when they want to change the defaults.

The environment variable is only used if there is no dynamic variable found.  
So, if a programmer wishes to use a specific buffer size in the program, they 
can.



Liz



Re: A practical benchmark shows speed challenges for Perl 6

2016-03-31 Thread Jan Ingvoldstad
On Wed, Mar 30, 2016 at 9:20 PM, Elizabeth Mattijsen  wrote:
>
>
> Thanks for your thoughts!
>
> I’ve implemented $*DEFAULT-READ-ELEMS in
> https://github.com/rakudo/rakudo/commit/5bd1e .
>
> Of course, all of this is provisional, and open for debate and
> bikeshedding.
>
>
Brilliant and brilliantly quick response, Liz!

In the spirit of bikeshedding, my first thought is that the variable name
should have something with BUFFER in it, as that is what it appears to be.
:)

Functionally, it's nice to be able to set it via the environment, but since
the environment may not necessarily be controlled by the programmer, I
consider that to be a short term solution.

A longer term solution would be for a way to set it within the program that
the environment cannot override.

Additionally, there could also be a default, compile-time option for Rakudo.


The reasoning behind _not_ setting things via environment variables, is
that this means the programmer now needs to worry what e.g. the webserver
running the Perl program does, and there are unknown stability (and
possibly security) implications. This adds bloat to the program.

The programmer is better off if they only explicitly need to worry about it
when they want to change the defaults.
-- 
Jan


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread Tom Browder
On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen  wrote:
> On 03/30/2016 03:45 AM, Timo Paulssen wrote:
>
> Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines
> and see if that makes things any faster?
>   - Timo
>
>
> Actually, the method on an IO::Handle is called "slurp-rest"; slurp would
> only work with a filename instead.
>   - Timo

Okay, I've done a comparison of the three methods on a 1 Gb file:

IO.lines
  real 2m11.827s
  user 2m10.036s
  sys 0m1.468s

IO.split
  real 1m51.504s
  user 1m51.136s
  sys 0m0.352s

IO.slurp-rest
  real 2m9.821s
  user 2m6.268s
  sys 0m3.532s

and Perl 5:

  real 0m4.614s
  user 0m4.328s
  sys 0m0.280s

Best,

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread James E Keenan

On 03/30/2016 04:11 PM, yary wrote:

On Wed, Mar 30, 2016 at 3:20 PM, Elizabeth Mattijsen  wrote:

Thanks for your thoughts!

I’ve implemented $*DEFAULT-READ-ELEMS in 
https://github.com/rakudo/rakudo/commit/5bd1e .

Of course, all of this is provisional, and open for debate and bikeshedding.





Yary, if you feel there's a need for this functionality in Perl *5* as 
well, please file a bug ticket via perlbug.


Thank you very much.
Jim Keenan


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread yary
On Wed, Mar 30, 2016 at 3:20 PM, Elizabeth Mattijsen  wrote:
> Thanks for your thoughts!
>
> I’ve implemented $*DEFAULT-READ-ELEMS in 
> https://github.com/rakudo/rakudo/commit/5bd1e .
>
> Of course, all of this is provisional, and open for debate and bikeshedding.


Thanks! And that was fast!

Allowing DEFAULT-READ-ELEMS to be set from the environment's a good
idea that I hadn't thought of- since it is a machine-dependent
performance tweak, letting it be set outside the code is a good idea.

I had originally envisioned this as an "option" to "sub open" for
fine-grained control as to which IO::Handles got what
DEFAULT-READ-ELEMS, but I'm not sure it belongs there. After all it is
a performance-related tweak and I'm liking the idea of it being
primarily set from the environment; setting it in the code means
you're writing something for a particular host, don't need to change
the spec to support that.

Is there anything similar on the "write" side- output buffering- that
could use this treatment?

-y


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread Elizabeth Mattijsen
> On 30 Mar 2016, at 16:06, yary  wrote:
> 
> Cross-posting to the compiler group-
> 
> On Wed, Mar 30, 2016 at 8:10 AM, Elizabeth Mattijsen  wrote:
>> If you know the line endings of the file, using 
>> IO::Handle.split($line-ending) (note the actual character, rather than a 
>> regular expression) might help.  That will read in the file in chunks of 64K 
>> and then lazily serve lines from that chunk.
> 
> This reminds me of a pet peeve I had with p5: Inability to easily
> change the default buffer size for reading & writing.
> 
> I'm the lone Perl expert at $work and at one point was trying to keep
> a file processing step in perl. These files were about 100x the size
> of the server's RAM, consisted of variable-length newline-terminated
> text, the processing was very light, there would be a few running in
> parallel. The candidate language, C#, has a text-file-reading object
> that lets you set its read-ahead buffer on creation/opening the file-
> can't remember the details. That size had a large impact on the
> performance of this task. With perl... I could not use the
> not-so-well-documented IO::Handle->setvbuf because my OS didn't
> support it. I did hack together something with sysread, but C# won in
> the end due partly to that.
> 
> It seems this "hiding-of-buffer" sub-optimal situation is being
> repeated in Perl6: neither https://doc.perl6.org/routine/open nor
> http://doc.perl6.org/type/IO::Handle mention a buffer, yet IO::Handle
> reads ahead and buffers. Experience shows that being able to adjust
> this buffer can help in certain situations. Also consider that perl5
> has defaulted to 4k and 8k, whereas perl6 is apparently using 64k, as
> evidence that this buffer needs to change as system builds evolve.
> 
> Please make this easily readable & settable, anywhere it's implemented!

Thanks for your thoughts!

I’ve implemented $*DEFAULT-READ-ELEMS in 
https://github.com/rakudo/rakudo/commit/5bd1e .

Of course, all of this is provisional, and open for debate and bikeshedding.


Liz

Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread Timo Paulssen

On 30/03/16 13:40, Tom Browder wrote:

On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen  wrote:

On 03/30/2016 03:45 AM, Timo Paulssen wrote:

Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines
and see if that makes things any faster?

...

Actually, the method on an IO::Handle is called "slurp-rest"; slurp would
only work with a filename instead.
   - Timo


Timo, I'm trying to test a situation where I could process every line
as it is read in.  The situation assumes the file is too large to
slurp into memory, thus the read of one line at a time.  So is there
another way to do that?  According to the docs "slurp-rest" gets all
the remaining file at one read.

Thanks,

Best regards,

-Tom


I was suggesting this mostly because we've recently discovered a very 
severe performance problem with IO.lines. I'd like to know if that also 
affects your benchmark and how big the saving might be for "moderately" 
sized data.


timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for 
"heap-snapshot".IO.lines {}'
129.14user 0.87system 2:10.44elapsed 99%CPU (0avgtext+0avgdata 
507580maxresident)k


timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for 
"heap-snapshot".IO.slurp.lines {}'
1.92user 0.14system 0:02.07elapsed 99%CPU (0avgtext+0avgdata 
537940maxresident)k


timo@schmand ~/p/e/SDL2_raw-p6 (master)> time perl6 -e 'for 
"heap-snapshot".IO.open.split("\n") {}'
192.04user 0.36system 3:12.70elapsed 99%CPU (0avgtext+0avgdata 
1350204maxresident)k


Hope this clears up how I meant that :)
  - Timo


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread yary
Cross-posting to the compiler group-

On Wed, Mar 30, 2016 at 8:10 AM, Elizabeth Mattijsen  wrote:
> If you know the line endings of the file, using 
> IO::Handle.split($line-ending) (note the actual character, rather than a 
> regular expression) might help.  That will read in the file in chunks of 64K 
> and then lazily serve lines from that chunk.

This reminds me of a pet peeve I had with p5: Inability to easily
change the default buffer size for reading & writing.

I'm the lone Perl expert at $work and at one point was trying to keep
a file processing step in perl. These files were about 100x the size
of the server's RAM, consisted of variable-length newline-terminated
text, the processing was very light, there would be a few running in
parallel. The candidate language, C#, has a text-file-reading object
that lets you set its read-ahead buffer on creation/opening the file-
can't remember the details. That size had a large impact on the
performance of this task. With perl... I could not use the
not-so-well-documented IO::Handle->setvbuf because my OS didn't
support it. I did hack together something with sysread, but C# won in
the end due partly to that.

It seems this "hiding-of-buffer" sub-optimal situation is being
repeated in Perl6: neither https://doc.perl6.org/routine/open nor
http://doc.perl6.org/type/IO::Handle mention a buffer, yet IO::Handle
reads ahead and buffers. Experience shows that being able to adjust
this buffer can help in certain situations. Also consider that perl5
has defaulted to 4k and 8k, whereas perl6 is apparently using 64k, as
evidence that this buffer needs to change as system builds evolve.

Please make this easily readable & settable, anywhere it's implemented!


-y


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread Elizabeth Mattijsen
> On 30 Mar 2016, at 13:40, Tom Browder  wrote:
> On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen  wrote:
>> On 03/30/2016 03:45 AM, Timo Paulssen wrote:
>> 
>> Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines
>> and see if that makes things any faster?
> ...
>> Actually, the method on an IO::Handle is called "slurp-rest"; slurp would
>> only work with a filename instead.
>>  - Timo
> Timo, I'm trying to test a situation where I could process every line
> as it is read in.  The situation assumes the file is too large to
> slurp into memory, thus the read of one line at a time.  So is there
> another way to do that?  According to the docs "slurp-rest" gets all
> the remaining file at one read.

That is correct.

The thing is that IO.lines basically depends on IO.get to get a line.  So that 
is extra overhead, that IO.slurp.lines doesn’t have.

If you know the line endings of the file, using IO::Handle.split($line-ending) 
(note the actual character, rather than a regular expression) might help.  That 
will read in the file in chunks of 64K and then lazily serve lines from that 
chunk.

A simple test on an /etc/dict/words:

$ 6 '"words".IO.lines.elems.say'
235886
real0m0.645s

$ 6 '"words".IO.open.split("\x0a").elems.say'
235887
real0m0.317s

Note that with .split you will get an extra empty line at the end.


Hope this helps.


Liz

Re: A practical benchmark shows speed challenges for Perl 6

2016-03-30 Thread Tom Browder
On Tue, Mar 29, 2016 at 10:29 PM, Timo Paulssen  wrote:
> On 03/30/2016 03:45 AM, Timo Paulssen wrote:
>
> Could you try using $filename.IO.slurp.lines instead of $filename.IO.lines
> and see if that makes things any faster?
...
> Actually, the method on an IO::Handle is called "slurp-rest"; slurp would
> only work with a filename instead.
>   - Timo


Timo, I'm trying to test a situation where I could process every line
as it is read in.  The situation assumes the file is too large to
slurp into memory, thus the read of one line at a time.  So is there
another way to do that?  According to the docs "slurp-rest" gets all
the remaining file at one read.

Thanks,

Best regards,

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-29 Thread Timo Paulssen
On 03/29/2016 10:47 PM, Tom Browder wrote:
> On Wednesday, February 3, 2016, Tom Browder  > wrote:
>
> I use Perl for heavy duty text processing. A question on Perl Monks
> about Perl 5's handling of a large input file got me wondering how the
> two Perls compare at the moment.
>
>
> I see no significant improvement using my string and file read
> tests with the latest moarvm on my laptop (bummer). I'll give
> comparison stats a little later after I use my faster home server.
>
> Best regards,
>
> -Tom
>

Could you try using $filename.IO.slurp.lines instead of
$filename.IO.lines and see if that makes things any faster?
  - Timo


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-29 Thread Timo Paulssen
On 02/03/2016 02:59 PM, Tom Browder wrote:
> I tried the suggestion from Bart Wiegmans to compile the program:
>
> $ perl6 --target=mbc --output=read-file-test.moarvm read-file-test.p6
> $ time perl6 read-file-test.moarvm large-1-gb-file.txt
> Error while reading from file: Malformed UTF-8
>
> So I guess precompilation is not yet ready for public testing.  That
> will be a nice feature, IMHO!
>
> Cheers!
>
> -Tom

Hey, the first time I read this, I didn't actually pay attention to
this. The reason you're getting that error is that you're not supposed
to invoke a .moarvm file with the perl6 command. Instead, you're
supposed to make your code into a module, put the generated .moarvm file
alongside it and then pull in that module with -M or "use" in your perl6
code.

You will probably need to pass a -I in order to make perl6 look for the
module in the right places.

By now, rakudo will automatically pre-compile anything you "use" or -M
(if that isn't prevented by "no precompilation" or things that aren't
supported), so that'll be easier.

Hope that helps!
  - Timo


Re: A practical benchmark shows speed challenges for Perl 6

2016-03-29 Thread Tom Browder
On Wednesday, February 3, 2016, Tom Browder  wrote:

> I use Perl for heavy duty text processing. A question on Perl Monks
> about Perl 5's handling of a large input file got me wondering how the
> two Perls compare at the moment.


I see no significant improvement using my string and file read tests with
the latest moarvm on my laptop (bummer). I'll give comparison stats a
little later after I use my faster home server.

Best regards,

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-02-03 Thread Tom Browder
On Wed, Feb 3, 2016 at 9:40 AM, Elizabeth Mattijsen  wrote:
> Is the code available somewhere?  Would love to try some optimizations on it.

Liz, code is in public pastes in Pastebin. Links are here:

1. The master Perl 6 script to run the tests:
  http://pastebin.com/MDmumWq0

2-3. The two file creation scripts (Perl 5 and 6):
  http://pastebin.com/RwizhVLT
  http://pastebin.com/g2VNHQnn

4-5. The two file reader scripts (Perl 5 and 6):
  http://pastebin.com/miY60cP3
  http://pastebin.com/XAkpF7bA

6-7. The smallest test file for a fast start:
  http://pastebin.com/d8ws15qm

Check inside file 1 which is set to just use the zero-gb file.  Near
the top of the file you'll see where you just select one of two Gb
arrays by commenting out one or the other:

#!/usr/bin/env perl6

# test file sizes:
#my @GB = <0 1 2 3 4 5 6 7 8 9 10>;
my @GB = <0>; # a small file for testing this file

Good luck. criticisms and suggestions welcome.

Cheers!

-Tom


Re: A practical benchmark shows speed challenges for Perl 6

2016-02-03 Thread Elizabeth Mattijsen
> On 03 Feb 2016, at 14:59, Tom Browder  wrote:
> I use Perl for heavy duty text processing. A question on Perl Monks
> about Perl 5's handling of a large input file got me wondering how the
> two Perls compare at the moment.
> 
> I wrote a couple of simple programs, in both languages, to write and
> read a 10 Gb text file filled with identical 100-character lines. The
> reading programs counted total lines and characters of the input file.
> The results on my fastest host show that much optimization is still
> needed for Perl 6.
> 
> I compared read times for file sizes from one to 10 Gb in one-gigabyte
> increments and, in general, Perl 6 takes roughly 30 times longer than
> Perl 5.14 to read the same file.  So far I see no significant
> improvement in Rakudo 2016.01 over 2015.12, but the tests haven't
> quite finished yet.

Is the code available somewhere?  Would love to try some optimizations on it.



Liz


Re: A practical benchmark shows speed challenges for Perl 6

2016-02-03 Thread Tom Browder
On Wed, Feb 3, 2016 at 9:40 AM, Elizabeth Mattijsen  wrote:
>> On 03 Feb 2016, at 14:59, Tom Browder  wrote:
>> I use Perl for heavy duty text processing. A question on Perl Monks
>> about Perl 5's handling of a large input file got me wondering how the
>> two Perls compare at the moment.
...
> Is the code available somewhere?  Would love to try some optimizations on it.

Yes, Liz, I'll put it on Pastebin.

It will be a while--wife is dragging me out the door for an appointment...

-Tom