You know, this was my suspicion Kirby. Thanks for giving the heads up... automaton rocks. Lewis
On Thu, May 23, 2013 at 5:06 PM, Kirby Bohling <[email protected]>wrote: > Standard micro-benchmark issues with Java, run the 50 last and it'll run > faster. JVM warmup, and JIT compilation, yadda, yadda, yadda. > > > On Thu, May 23, 2013 at 1:57 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi All, > > A really nice aspect of the regex (urlfilter-automaton and > urfilter-regex) > > plugin implementation's in Nutch is that there is a small but very useful > > RegexURLFilterBaseTest [0] which compares benchmarks for simple regex > > parsing. > > The results we get are as follows > > > > urls automaton regex > > 50 343ms 210ms > > 100 48ms 187ms > > 200 65ms 363ms > > 400 100ms 692ms > > 800 165ms 1385ms > > > > The problem I have here is understanding why the first (50) bench appears > > to be more expensive for both implementations? > > Additionally, why does this same bench cost much more for automaton? > > > > Anyone have a clue? > > Thanks > > Lewis > > > > [0] > > > > > http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup > > > > -- > > *Lewis* > > > -- *Lewis*

