Re: Explanation of RegexURLFIlterTestBase benchmark's

Kirby Bohling Thu, 23 May 2013 18:37:32 -0700

Re-reading my e-mail, I realize it might be read poorly.  Thanks for giving
me the benefit of the doubt.

There's a bunch of good material out on the web, and ultimately, the truth
is that micro benchmarks can always be misleading, and the only accurate
benchmark is real workload testing.  That said, micros benchmarks can be
accurate and useful in limit contexts.

There are several good resources if you want to do micro benchmarking w/
Java:
https://code.google.com/p/caliper/  (Full Disclosure: Written by my
employer.  I've never used it, but the theory/docs are sound)

Peter Lawrey has a good blog that touches on issues of performance and has
a couple of posts explicitly on mistakes he's made in extremely high
performance benchmarking:
http://mechanical-sympathy.blogspot.com/

Pretty decent explanations here:
http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java

Not that I think you need them in particular, but it seems like Nutch could
be doing plenty of benchmarking, and micro benchmarking in particular.
 Knowing the pitfalls is valuable.  Lots of very smart people screw this up
regularly and make poorly founded decisions armed with faulty data.  Not
always sure I qualify as smart, but I've done it more than once.

Anyways, those reference have plenty of gory details for folks who are
interested in why things like this happen.

Kirby

On Thu, May 23, 2013 at 6:48 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> You know, this was my suspicion Kirby.
> Thanks for giving the heads up... automaton rocks.
> Lewis
>
>
> On Thu, May 23, 2013 at 5:06 PM, Kirby Bohling <[email protected]
> >wrote:
>
> > Standard micro-benchmark issues with Java, run the 50 last and it'll run
> > faster.  JVM warmup, and JIT compilation, yadda, yadda, yadda.
> >
> >
> > On Thu, May 23, 2013 at 1:57 PM, Lewis John Mcgibbney <
> > [email protected]> wrote:
> >
> > > Hi All,
> > > A really nice aspect of the regex (urlfilter-automaton and
> > urfilter-regex)
> > > plugin implementation's in Nutch is that there is a small but very
> useful
> > > RegexURLFilterBaseTest [0] which compares benchmarks for simple regex
> > > parsing.
> > > The results we get are as follows
> > >
> > > urls      automaton      regex
> > > 50        343ms           210ms
> > > 100      48ms             187ms
> > > 200      65ms             363ms
> > > 400      100ms           692ms
> > > 800      165ms           1385ms
> > >
> > > The problem I have here is understanding why the first (50) bench
> appears
> > > to be more expensive for both implementations?
> > > Additionally, why does this same bench cost much more for automaton?
> > >
> > > Anyone have a clue?
> > > Thanks
> > > Lewis
> > >
> > > [0]
> > >
> > >
> >
> http://svn.apache.org/viewvc/nutch/branches/2.x/src/plugin/lib-regex-filter/src/test/org/apache/nutch/urlfilter/api/RegexURLFilterBaseTest.java?view=markup
> > >
> > > --
> > > *Lewis*
> > >
> >
>
>
>
> --
> *Lewis*
>

Re: Explanation of RegexURLFIlterTestBase benchmark's

Reply via email to