Just ran the tests twice (to be clear: invoked bench() twice in same run)
to see the timings for regex-urlfilter:

(inputs) time
*(50) 231ms*
(100) 169ms
(200) 326ms
(400) 683ms
(800) 1420ms
*(50) 109ms*
(100) 188ms
(200) 319ms
(400) 714ms
(800) 1442ms

Kirby is right.


On Thu, May 23, 2013 at 5:48 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> You know, this was my suspicion Kirby.
> Thanks for giving the heads up... automaton rocks.
> Lewis
>
>
> On Thu, May 23, 2013 at 5:06 PM, Kirby Bohling <[email protected]
> >wrote:
>
> > Standard micro-benchmark issues with Java, run the 50 last and it'll run
> > faster.  JVM warmup, and JIT compilation, yadda, yadda, yadda.
> >
> >
> > On Thu, May 23, 2013 at 1:57 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > Hi All,
> > > A really nice aspect of the regex (urlfilter-automaton and
> > urfilter-regex)
> > > plugin implementation's in Nutch is that there is a small but very
> useful
> > > RegexURLFilterBaseTest [0] which compares benchmarks for simple regex
> > > parsing.
> > > The results we get are as follows
> > >
> > > urls      automaton      regex
> > > 50        343ms           210ms
> > > 100      48ms             187ms
> > > 200      65ms             363ms
> > > 400      100ms           692ms
> > > 800      165ms           1385ms
> > >
> > > The problem I have here is understanding why the first (50) bench
> appears
> > > to be more expensive for both implementations?
> > > Additionally, why does this same bench cost much more for automaton?
> > >
> > > Anyone have a clue?
> > > Thanks
> > > Lewis
> > >
> > > [0]
> > >
> > >
> >
> http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup
> > >
> > > --
> > > *Lewis*
> > >
> >
>
>
>
> --
> *Lewis*
>

Reply via email to