Re: Explanation of RegexURLFIlterTestBase benchmark's

Lewis John Mcgibbney Thu, 23 May 2013 18:51:42 -0700

Hi Kirby,

On Thu, May 23, 2013 at 6:36 PM, Kirby Bohling <[email protected]>wrote:

>
> Not that I think you need them in particular, but it seems like Nutch could
> be doing plenty of benchmarking, and micro benchmarking in particular.
>

I agree with this. It is not my goal to attack this head on but (I think)
it is useful for us to know more about the different components of Nutch
and how they operate, micro benchmarking would certainly be a way of making
this realistic.
This being said, I am quite keen on the idea of third party libraries (such
as bk.brics automaton [0]) being tested in thier own environment, by their
own development team. In this case, some comparative *results* (of an older
bk.brics library) can be seen here [1].
Anyone is free to infer from this what they wish, but it gives a bit of an
idea about the gains which can be achieved.
If regex p is something which you (I mean this collectively to refer to
anyone) think is a bottle neck for your Nutch deployment. Try out the
automaton plugin and hopefully things get better for you. AFAIK we use the
most up-to-date library available here so things should work well.

Thanks for the post Kirby.

[0] http://www.brics.dk/automaton/index.html
[1] http://tusker.org/regex/regex_benchmark.html

Re: Explanation of RegexURLFIlterTestBase benchmark's

Reply via email to