On Thu, Feb 21, 2013 at 12:12 PM, Tikhon B. <[email protected]> wrote:
> Hi All,
>
> I am trying to figure out a performance question with the ruby regex
> matcher. Running the following code takes excessively longer than
> expected.

You are doing the measurement wrong.  You do not only measure matching
but also creation of the String.  Given that fact, all your results
are moot.

> 1. Benchmark.measure {("a" * 10_000).match(/a.*?b/)}
> =>   1.140000   0.010000   1.150000 (  1.195161)
>
> 2. Benchmark.measure {("a" * 10_000).match(/a[^b]*b/)}
> =>   1.400000   0.000000   1.400000 (  1.462178)
>
> Thinking the parser was doing some unnecessary backtracking I also tried
> atomic grouping, and other simpler expressions with very limited
> success:
>
> 3. Benchmark.measure {("a" * 10_000).match(/(?>a[^b]*b)/)}
> =>   1.400000   0.000000   1.400000 (  1.462178)
>
> 4. Benchmark.measure {("a" * 10_000).match(/aa*b/)}
> =>   1.270000   0.000000   1.270000 (  1.331137)
>
> 5. Benchmark.measure{val.match(/a.*b/)}
> =>   0.460000   0.000000   0.460000 (  0.487267)

Where does 'val' come from and what does it reference?

> This may be a very simple vector for a DDOS attack against a server
> using a similar expression to verify input data. What more, since valid
> input data is not likely to resemble scenarios 1-6 the issue may go
> undetected by a developers and testers not aware of this problem.
>
> Conversely, other developers may be turned off from using a regex in a
> place where it would perform well due to simple performance testing as
> above.
>
> This sort of performance hit is in neither the NodeJS nor Python regex
> engines. Writing this message I think I got a better grasp of the cause
> of the issue, but I still feel that it needs to be addressed for the
> sake of security.

Before we jump to conclusions please provide the proper and complete
test.  Note also that it is a common fact that NFA's can suffer from
backtracking in certain conditions.  So I am not too surprised nor
worried.  I did not look closely at all your test cases but often
backtracking issues can be remedied by using greediness modifiers
(either "greedy" or "reluctant").

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
[email protected] | 
https://groups.google.com/d/forum/ruby-talk-google?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"ruby-talk-google" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to