Hi All,
A really nice aspect of the regex (urlfilter-automaton and urfilter-regex)
plugin implementation's in Nutch is that there is a small but very useful
RegexURLFilterBaseTest [0] which compares benchmarks for simple regex
parsing.
The results we get are as follows

urls      automaton      regex
50        343ms           210ms
100      48ms             187ms
200      65ms             363ms
400      100ms           692ms
800      165ms           1385ms

The problem I have here is understanding why the first (50) bench appears
to be more expensive for both implementations?
Additionally, why does this same bench cost much more for automaton?

Anyone have a clue?
Thanks
Lewis

[0]
http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup

-- 
*Lewis*

Reply via email to