On Fri, Feb 22, 2013 at 1:21 AM, Tikhon B. <[email protected]> wrote:
> Robert Klemme wrote in post #1098199:
>> On Thu, Feb 21, 2013 at 12:12 PM, Tikhon B. <[email protected]>
>> wrote:

>> You are doing the measurement wrong.  You do not only measure matching
>> but also creation of the String.  Given that fact, all your results
>> are moot.
>
> I actively considered this matter when submitting the post, but in the
> end I included the string creating in the test to allow anyone to try
> out the example by copying one line.

Having a second line which defines a variable or constant isn't really
that more inconvenient.

> The time necessary for the actual
> creation process is negligible; I need to run the string creation ten
> thousand times before the result is substantial enough to significantly
> affect the numbers I'm getting.

Still you are measuring one thing but reasoning about another.

> I am not sure what sort of proper and complete tests you are looking
> for.

Tests which measure exactly the functionality you want to scrutinize
not something else.

> Regarding the actual slowdown, the issue comes down to the engine
> repeating the match from the start for every single "a" in the string,
> then failing and trying again with the next letter.

Backtracking.

> I do not see how a
> greedy modifier can help in this situation, and I do not know enough
> about the ruby regex engine to comment on the underlying design.

Mine was a general statement about issues that can be avoided with
greediness modifiers.  I didn't say it's the solution to all
performance issues of this kind.

I had a closer look at your expressions.  They are of course
artificial to demonstrate the effects of backtracking.  I would assume
that one would typically write different expressions in real
applications, for example, using anchoring which dramatically improves
the situation here.

$ ruby rx-bm.rb
                                user     system      total        real
 0 /(?>a[^b]*b)/           11.732000   0.000000  11.732000 ( 11.737671)
 1 /(?>a[^b]*b)/            0.015000   0.000000   0.015000 (  0.003000)
 0 /\A(?>a[^b]*b)/          0.000000   0.000000   0.000000 (  0.002000)
 1 /\A(?>a[^b]*b)/          0.000000   0.000000   0.000000 (  0.003000)
 0 /(?<!a)(?>a[^b]*b)/      0.016000   0.000000   0.016000 (  0.012001)
 1 /(?<!a)(?>a[^b]*b)/      0.000000   0.000000   0.000000 (  0.003000)
 0 /aa*b/                  11.700000   0.000000  11.700000 ( 11.703670)
 1 /aa*b/                   0.000000   0.000000   0.000000 (  0.002000)
 0 /\Aaa*b/                 0.000000   0.000000   0.000000 (  0.002000)
 1 /\Aaa*b/                 0.015000   0.000000   0.015000 (  0.002000)
 0 /(?<!a)aa*b/             0.000000   0.000000   0.000000 (  0.013001)
 1 /(?<!a)aa*b/             0.016000   0.000000   0.016000 (  0.002000)
 0 /a+b/                   11.248000   0.000000  11.248000 ( 11.252644)
 1 /a+b/                    0.000000   0.000000   0.000000 (  0.002000)
 0 /\Aa+b/                  0.000000   0.000000   0.000000 (  0.003000)
 1 /\Aa+b/                  0.000000   0.000000   0.000000 (  0.002000)
 0 /(?<!a)a+b/              0.015000   0.000000   0.015000 (  0.012001)
 1 /(?<!a)a+b/              0.000000   0.000000   0.000000 (  0.003000)
 0 /a.*b/                   4.649000   0.000000   4.649000 (  4.641265)
 1 /a.*b/                   0.000000   0.000000   0.000000 (  0.001000)
 0 /\Aa.*b/                 0.000000   0.000000   0.000000 (  0.001000)
 1 /\Aa.*b/                 0.000000   0.000000   0.000000 (  0.001000)
 0 /(?<!a)a.*b/             0.016000   0.000000   0.016000 (  0.011001)
 1 /(?<!a)a.*b/             0.000000   0.000000   0.000000 (  0.000000)
$ cat -n rx-bm.rb
     1
     2  require 'benchmark'
     3
     4  strings = [
     5    ("a" * 10_000).freeze,
     6    ("a" * 10_000 + "b").freeze
     7  ]
     8
     9  data = [
    10          /(?>a[^b]*b)/,
    11          /\A(?>a[^b]*b)/,
    12          /(?<!a)(?>a[^b]*b)/,
    13
    14          /aa*b/,
    15          /\Aaa*b/,
    16          /(?<!a)aa*b/,
    17
    18          /a+b/,
    19          /\Aa+b/,
    20          /(?<!a)a+b/,
    21
    22          /a.*b/,
    23          /\Aa.*b/,
    24          /(?<!a)a.*b/,
    25  ]
    26
    27  REP = 20
    28
    29  Benchmark.bm 25 do |x|
    30    data.each do |rx|
    31      strings.each_with_index do |s, i|
    32        x.report sprintf("%2d %p", i, rx) do
    33          REP.times do
    34            rx.match s
    35          end
    36        end
    37      end
    38    end
    39  end
$

So, yes, there is a cost for backtracking involved, but it only shows
only in particular situations:
- large inputs
- specific expressions (absence of anchoring, using unlimited
repetition operators vs. limited (e.g. /a{0,10}/).

The phenomenon is well known with regular expression engines I
believe.  The cases you mention seem to be rather the exception than
the common state of affairs.  Which doesn't mean that Ruby's engine
cannot be improved.  I just don't think it's not that dramatic an
issue.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
[email protected] | 
https://groups.google.com/d/forum/ruby-talk-google?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"ruby-talk-google" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to