Julien,
On Mon, Nov 28, 2011 at 12:47 PM, Julien Nioche <[email protected]> wrote: > That would be a good thing to benchmark. IIRC there is a JIRA about > improvements to the Finite State library we use, would be good to see the > impact of the patch. The regex-urlfilter will probably take more memory and > be much slower. > https://issues.apache.org/jira/browse/NUTCH-1068 Pretty sure that is the JIRA item you are discussing. Still not sure what to do with the Automaton library, I don't think that the maintainer has integrated any parts of the performance improvements from Lucene. Kirby > Julien > > On 28 November 2011 18:14, Markus Jelsma <[email protected]> wrote: > >> Hi, >> >> Anyone used URL filters containing up to a million rows? In our case this >> would be only 25MB so heap space is no problem (unless the data is not >> shared >> between threads). Will it perform? >> >> Thanks, >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com >

