Just ran the tests twice (to be clear: invoked bench() twice in same run) to see the timings for regex-urlfilter:
(inputs) time *(50) 231ms* (100) 169ms (200) 326ms (400) 683ms (800) 1420ms *(50) 109ms* (100) 188ms (200) 319ms (400) 714ms (800) 1442ms Kirby is right. On Thu, May 23, 2013 at 5:48 PM, Lewis John Mcgibbney < [email protected]> wrote: > You know, this was my suspicion Kirby. > Thanks for giving the heads up... automaton rocks. > Lewis > > > On Thu, May 23, 2013 at 5:06 PM, Kirby Bohling <[email protected] > >wrote: > > > Standard micro-benchmark issues with Java, run the 50 last and it'll run > > faster. JVM warmup, and JIT compilation, yadda, yadda, yadda. > > > > > > On Thu, May 23, 2013 at 1:57 PM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > > > Hi All, > > > A really nice aspect of the regex (urlfilter-automaton and > > urfilter-regex) > > > plugin implementation's in Nutch is that there is a small but very > useful > > > RegexURLFilterBaseTest [0] which compares benchmarks for simple regex > > > parsing. > > > The results we get are as follows > > > > > > urls automaton regex > > > 50 343ms 210ms > > > 100 48ms 187ms > > > 200 65ms 363ms > > > 400 100ms 692ms > > > 800 165ms 1385ms > > > > > > The problem I have here is understanding why the first (50) bench > appears > > > to be more expensive for both implementations? > > > Additionally, why does this same bench cost much more for automaton? > > > > > > Anyone have a clue? > > > Thanks > > > Lewis > > > > > > [0] > > > > > > > > > http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup > > > > > > -- > > > *Lewis* > > > > > > > > > -- > *Lewis* >

