Julien,


On Mon, Nov 28, 2011 at 12:47 PM, Julien Nioche
<[email protected]> wrote:
> That would be a good thing to benchmark. IIRC there is a JIRA about
> improvements to the Finite State library we use, would be good to see the
> impact of the patch. The regex-urlfilter will probably take more memory and
> be much slower.
>

https://issues.apache.org/jira/browse/NUTCH-1068

Pretty sure that is the JIRA item you are discussing.  Still not sure
what to do with the Automaton library, I don't think that the
maintainer has integrated any parts of the performance improvements
from Lucene.

Kirby


> Julien
>
> On 28 November 2011 18:14, Markus Jelsma <[email protected]> wrote:
>
>> Hi,
>>
>> Anyone used URL filters containing up to a million rows? In our case this
>> would be only 25MB so heap space is no problem (unless the data is not
>> shared
>> between threads). Will it perform?
>>
>> Thanks,
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Reply via email to