[perl #127064] [PERF] Variable interpolation in regex very slow

2018-05-13 Thread Daniel Green via RT
On Tue, 07 Nov 2017 17:14:15 -0800, ddgr...@gmail.com wrote:
> On Tue, 07 Nov 2017 17:10:29 -0800, ddgr...@gmail.com wrote:
> > On Sun, 15 Oct 2017 05:19:54 -0700, ddgr...@gmail.com wrote:
> > > On Thu, 31 Dec 2015 05:39:24 -0800, ju...@jules.uk wrote:
> > > >
> > > >
> > > > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > > > > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> > > > >> # New Ticket Created by  Jules Field
> > > > >> # Please include the string:  [perl #127064]
> > > > >> # in the subject line of all future correspondence about this
> > > > >> issue.
> > > > >> # https://rt.perl.org/Ticket/Display.html?id=127064 >
> > > > >>
> > > > >>
> > > > >> Given
> > > > >> my @lines = "some-text.txt".IO.lines;
> > > > >> my $s = 'Jules';
> > > > >> (some-text.txt is about 43k lines)
> > > > >>
> > > > >> Doing
> > > > >> my @matching = @lines.grep(/ $s /);
> > > > >> is about 50 times slower than
> > > > >> my @matching = @lines.grep(/ Jules /);
> > > > >>
> > > > >> And if $s happened to contain anything other than literals, so
> > > > >> I
> > > > >> had
> > > > >> to us
> > > > >> my @matching = @lines.grep(/ <$s> /);
> > > > >> then it's nearly 150 times slower.
> > > > >>
> > > > >> my @matching = @lines.grep($s);
> > > > >> doesn't appear to work. It matches 0 lines but doesn't die.
> > > > >>
> > > > >> The lack of Perl5's straightforward variable interpolation in
> > > > >> regexs
> > > > >> is crippling the speed.
> > > > >> Is there a faster alternative? (other than EVAL to build the
> > > > >> regex)
> > > > >>
> > > > > For now, you can use @lines.grep(*.contains($s)), which will be
> > > > > sufficiently fast.
> > > > >
> > > > > Ideally, our regex optimizer would turn this simple regex into
> > > > > a
> > > > > code
> > > > > that uses .index to find a literal string and construct a match
> > > > > object
> > > > > for that. Or even - if you put a literal "so" in front - turn
> > > > > it
> > > > > into
> > > > > .contains($literal) if it knows that the match object will only
> > > > > be
> > > > > inspected for true/false.
> > > > >
> > > > > Until then, we ought to be able to make interpolation a bit
> > > > > faster.
> > > > >- Timo
> > > > Many thanks for that. I hadn't thought to use Whatever.
> > > >
> > > > I would ideally also be doing case-insensitive regexps, but they
> > > > are
> > > > 50
> > > > times slower than case-sensitive ones, even in trivial cases.
> > > > Maybe a :adverb for rx// that says "give me static (i.e. Perl5-
> > > > style)
> > > > interpolation in this regex"?
> > > > I can see the advantage of passing the variables to the regex
> > > > engine,
> > > > as
> > > > then they can change over time.
> > > >
> > > > But that's not something I want to do very often, far more
> > > > frequently
> > > > I
> > > > just need to construct the regex at run-time and have it go as
> > > > fast
> > > > as
> > > > possible.
> > > >
> > > > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k
> > > > lines
> > > > of
> > > > it!).
> > > >
> > > > Jules
> > >
> > >
> > > I recently attempted to make interpolating into regexes a little
> > > faster. This is what I was using for a benchmark:
> > > perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now;
> > > my
> > > @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> > > sm.sql is 10k lines, of which 1283 contain the text "Perl6".
> > >
> > > This is Rakudo version 2017.09 built on MoarVM version 2017.09.1:
> > > / $s / took 5.3s and / <$s> / took 16.5s.
> > >
> > > This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM
> > > version
> > > 2017.09.1-595-g716f2277f:
> > > / $s / took 3.2s and / <$s> / took 14.5s.
> > >
> > > However, if you type the string to interpolate it is *much* faster
> > > for
> > > literal interpolation.
> > > perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t =
> > > now;
> > > my @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> > > This takes only 0.33s.
> > >
> > > This is still nowhere near as fast as grep(*.contains($s)) though,
> > > which only takes 0.037s.
> >
> >
> > This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version
> > 2017.10-58-gad8618468:
> > / $s / took 2.7s and / <$s> / took 7.0s.
> 
> 
> Adding :i (case insensitive adverb), /:i $s / took 3.0s and /:i <$s> /
> took 7.7s.


This is Rakudo version 2018.04.1-76-g9b915f09d built on MoarVM version 
2018.04.1-98-g1aa02fe45
implementing Perl 6.c.
/ $s / took 1.8s and / <$s> / took 2.6s


[perl #127064] [PERF] Variable interpolation in regex very slow

2017-11-07 Thread Daniel Green via RT
On Tue, 07 Nov 2017 17:10:29 -0800, ddgr...@gmail.com wrote:
> On Sun, 15 Oct 2017 05:19:54 -0700, ddgr...@gmail.com wrote:
> > On Thu, 31 Dec 2015 05:39:24 -0800, ju...@jules.uk wrote:
> > >
> > >
> > > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > > > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> > > >> # New Ticket Created by  Jules Field
> > > >> # Please include the string:  [perl #127064]
> > > >> # in the subject line of all future correspondence about this
> > > >> issue.
> > > >> # https://rt.perl.org/Ticket/Display.html?id=127064 >
> > > >>
> > > >>
> > > >> Given
> > > >> my @lines = "some-text.txt".IO.lines;
> > > >> my $s = 'Jules';
> > > >> (some-text.txt is about 43k lines)
> > > >>
> > > >> Doing
> > > >> my @matching = @lines.grep(/ $s /);
> > > >> is about 50 times slower than
> > > >> my @matching = @lines.grep(/ Jules /);
> > > >>
> > > >> And if $s happened to contain anything other than literals, so I
> > > >> had
> > > >> to us
> > > >> my @matching = @lines.grep(/ <$s> /);
> > > >> then it's nearly 150 times slower.
> > > >>
> > > >> my @matching = @lines.grep($s);
> > > >> doesn't appear to work. It matches 0 lines but doesn't die.
> > > >>
> > > >> The lack of Perl5's straightforward variable interpolation in
> > > >> regexs
> > > >> is crippling the speed.
> > > >> Is there a faster alternative? (other than EVAL to build the
> > > >> regex)
> > > >>
> > > > For now, you can use @lines.grep(*.contains($s)), which will be
> > > > sufficiently fast.
> > > >
> > > > Ideally, our regex optimizer would turn this simple regex into a
> > > > code
> > > > that uses .index to find a literal string and construct a match
> > > > object
> > > > for that. Or even - if you put a literal "so" in front - turn it
> > > > into
> > > > .contains($literal) if it knows that the match object will only
> > > > be
> > > > inspected for true/false.
> > > >
> > > > Until then, we ought to be able to make interpolation a bit
> > > > faster.
> > > >- Timo
> > > Many thanks for that. I hadn't thought to use Whatever.
> > >
> > > I would ideally also be doing case-insensitive regexps, but they
> > > are
> > > 50
> > > times slower than case-sensitive ones, even in trivial cases.
> > > Maybe a :adverb for rx// that says "give me static (i.e. Perl5-
> > > style)
> > > interpolation in this regex"?
> > > I can see the advantage of passing the variables to the regex
> > > engine,
> > > as
> > > then they can change over time.
> > >
> > > But that's not something I want to do very often, far more
> > > frequently
> > > I
> > > just need to construct the regex at run-time and have it go as fast
> > > as
> > > possible.
> > >
> > > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines
> > > of
> > > it!).
> > >
> > > Jules
> >
> >
> > I recently attempted to make interpolating into regexes a little
> > faster. This is what I was using for a benchmark:
> > perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my
> > @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> > sm.sql is 10k lines, of which 1283 contain the text "Perl6".
> >
> > This is Rakudo version 2017.09 built on MoarVM version 2017.09.1:
> > / $s / took 5.3s and / <$s> / took 16.5s.
> >
> > This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version
> > 2017.09.1-595-g716f2277f:
> > / $s / took 3.2s and / <$s> / took 14.5s.
> >
> > However, if you type the string to interpolate it is *much* faster
> > for
> > literal interpolation.
> > perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t =
> > now;
> > my @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> > This takes only 0.33s.
> >
> > This is still nowhere near as fast as grep(*.contains($s)) though,
> > which only takes 0.037s.
> 
> 
> This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version
> 2017.10-58-gad8618468:
> / $s / took 2.7s and / <$s> / took 7.0s.


Adding :i (case insensitive adverb), /:i $s / took 3.0s and /:i <$s> / took 
7.7s.


[perl #127064] [PERF] Variable interpolation in regex very slow

2017-11-07 Thread Daniel Green via RT
On Sun, 15 Oct 2017 05:19:54 -0700, ddgr...@gmail.com wrote:
> On Thu, 31 Dec 2015 05:39:24 -0800, ju...@jules.uk wrote:
> >
> >
> > On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> > >> # New Ticket Created by  Jules Field
> > >> # Please include the string:  [perl #127064]
> > >> # in the subject line of all future correspondence about this
> > >> issue.
> > >> # https://rt.perl.org/Ticket/Display.html?id=127064 >
> > >>
> > >>
> > >> Given
> > >> my @lines = "some-text.txt".IO.lines;
> > >> my $s = 'Jules';
> > >> (some-text.txt is about 43k lines)
> > >>
> > >> Doing
> > >> my @matching = @lines.grep(/ $s /);
> > >> is about 50 times slower than
> > >> my @matching = @lines.grep(/ Jules /);
> > >>
> > >> And if $s happened to contain anything other than literals, so I
> > >> had
> > >> to us
> > >> my @matching = @lines.grep(/ <$s> /);
> > >> then it's nearly 150 times slower.
> > >>
> > >> my @matching = @lines.grep($s);
> > >> doesn't appear to work. It matches 0 lines but doesn't die.
> > >>
> > >> The lack of Perl5's straightforward variable interpolation in
> > >> regexs
> > >> is crippling the speed.
> > >> Is there a faster alternative? (other than EVAL to build the
> > >> regex)
> > >>
> > > For now, you can use @lines.grep(*.contains($s)), which will be
> > > sufficiently fast.
> > >
> > > Ideally, our regex optimizer would turn this simple regex into a
> > > code
> > > that uses .index to find a literal string and construct a match
> > > object
> > > for that. Or even - if you put a literal "so" in front - turn it
> > > into
> > > .contains($literal) if it knows that the match object will only be
> > > inspected for true/false.
> > >
> > > Until then, we ought to be able to make interpolation a bit faster.
> > >- Timo
> > Many thanks for that. I hadn't thought to use Whatever.
> >
> > I would ideally also be doing case-insensitive regexps, but they are
> > 50
> > times slower than case-sensitive ones, even in trivial cases.
> > Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style)
> > interpolation in this regex"?
> > I can see the advantage of passing the variables to the regex engine,
> > as
> > then they can change over time.
> >
> > But that's not something I want to do very often, far more frequently
> > I
> > just need to construct the regex at run-time and have it go as fast
> > as
> > possible.
> >
> > Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of
> > it!).
> >
> > Jules
> 
> 
> I recently attempted to make interpolating into regexes a little
> faster. This is what I was using for a benchmark:
> perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my
> @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> sm.sql is 10k lines, of which 1283 contain the text "Perl6".
> 
> This is Rakudo version 2017.09 built on MoarVM version 2017.09.1:
> / $s / took 5.3s and / <$s> / took 16.5s.
> 
> This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version
> 2017.09.1-595-g716f2277f:
> / $s / took 3.2s and / <$s> / took 14.5s.
> 
> However, if you type the string to interpolate it is *much* faster for
> literal interpolation.
> perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now;
> my @m = @l.grep(/ $s /); say @m.elems; say now - $t'
> This takes only 0.33s.
> 
> This is still nowhere near as fast as grep(*.contains($s)) though,
> which only takes 0.037s.


This is Rakudo version 2017.10-143-g0e50993f4 built on MoarVM version 
2017.10-58-gad8618468:
/ $s / took 2.7s and / <$s> / took 7.0s.


[perl #127064] [PERF] Variable interpolation in regex very slow

2017-10-15 Thread Daniel Green via RT
On Thu, 31 Dec 2015 05:39:24 -0800, ju...@jules.uk wrote:
> 
> 
> On 29/12/2015 23:05, Timo Paulssen via RT wrote:
> > On 12/29/2015 12:46 AM, Jules Field (via RT) wrote:
> >> # New Ticket Created by  Jules Field
> >> # Please include the string:  [perl #127064]
> >> # in the subject line of all future correspondence about this issue.
> >> # https://rt.perl.org/Ticket/Display.html?id=127064 >
> >>
> >>
> >> Given
> >> my @lines = "some-text.txt".IO.lines;
> >> my $s = 'Jules';
> >> (some-text.txt is about 43k lines)
> >>
> >> Doing
> >> my @matching = @lines.grep(/ $s /);
> >> is about 50 times slower than
> >> my @matching = @lines.grep(/ Jules /);
> >>
> >> And if $s happened to contain anything other than literals, so I had
> >> to us
> >> my @matching = @lines.grep(/ <$s> /);
> >> then it's nearly 150 times slower.
> >>
> >> my @matching = @lines.grep($s);
> >> doesn't appear to work. It matches 0 lines but doesn't die.
> >>
> >> The lack of Perl5's straightforward variable interpolation in regexs
> >> is crippling the speed.
> >> Is there a faster alternative? (other than EVAL to build the regex)
> >>
> > For now, you can use @lines.grep(*.contains($s)), which will be
> > sufficiently fast.
> >
> > Ideally, our regex optimizer would turn this simple regex into a code
> > that uses .index to find a literal string and construct a match
> > object
> > for that. Or even - if you put a literal "so" in front - turn it into
> > .contains($literal) if it knows that the match object will only be
> > inspected for true/false.
> >
> > Until then, we ought to be able to make interpolation a bit faster.
> >- Timo
> Many thanks for that. I hadn't thought to use Whatever.
> 
> I would ideally also be doing case-insensitive regexps, but they are
> 50
> times slower than case-sensitive ones, even in trivial cases.
> Maybe a :adverb for rx// that says "give me static (i.e. Perl5-style)
> interpolation in this regex"?
> I can see the advantage of passing the variables to the regex engine,
> as
> then they can change over time.
> 
> But that's not something I want to do very often, far more frequently
> I
> just need to construct the regex at run-time and have it go as fast as
> possible.
> 
> Just thoughts from a big Perl5 user (e.g. MailScanner is 50k lines of
> it!).
> 
> Jules


I recently attempted to make interpolating into regexes a little faster. This 
is what I was using for a benchmark:
perl6 -e 'my @l = "sm.sql".IO.lines; my $s = "Perl6"; my $t = now; my @m = 
@l.grep(/ $s /); say @m.elems; say now - $t'
sm.sql is 10k lines, of which 1283 contain the text "Perl6".

This is Rakudo version 2017.09 built on MoarVM version 2017.09.1:
/ $s / took 5.3s and / <$s> / took 16.5s.

This is Rakudo version 2017.09-427-gd23a9ba9d built on MoarVM version 
2017.09.1-595-g716f2277f:
/ $s / took 3.2s and / <$s> / took 14.5s.

However, if you type the string to interpolate it is *much* faster for literal 
interpolation.
perl6 -e 'my @l = "sm.sql".IO.lines; my Str $s = "Perl6"; my $t = now; my @m = 
@l.grep(/ $s /); say @m.elems; say now - $t'
This takes only 0.33s.

This is still nowhere near as fast as grep(*.contains($s)) though, which only 
takes 0.037s.