Hi All, A really nice aspect of the regex (urlfilter-automaton and urfilter-regex) plugin implementation's in Nutch is that there is a small but very useful RegexURLFilterBaseTest [0] which compares benchmarks for simple regex parsing. The results we get are as follows
urls automaton regex 50 343ms 210ms 100 48ms 187ms 200 65ms 363ms 400 100ms 692ms 800 165ms 1385ms The problem I have here is understanding why the first (50) bench appears to be more expensive for both implementations? Additionally, why does this same bench cost much more for automaton? Anyone have a clue? Thanks Lewis [0] http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup -- *Lewis*

