Re: Mini Accumulo cluster

Josh Elser Wed, 13 May 2015 13:20:32 -0700

As long as you're managing your expectations (which I sounds like you'veconsidered well), there could be some worth.

A concern would be how using a different filesystem implementationactually impacts the validity of your benchmark though.

e.g. w/ a local FS (which is by default what MAC does), a disk seekcosts 10ms, but using your real HDFS cluster, it's 200ms. IteratorA doesmore seeks but is less efficient on the retrieved data while IteratorBdoes fewer seeks but is more efficient on the retrieved data would leadto inaccurate benchmarks on a production system.

I guess another way to put it is that total wall time for a query mightbe deceiving in a test environment.


Dave Hardcastle wrote:

Hi,

Is it crazy to use a MiniAccumuloCluster to measure the *relative*
performance of two different implementations of iterators?

Obviously it would be better to do it on a real Accumulo cluster, but
that's not possible for several reasons.

The approach would be something like:
- Fire up a Mini cluster
- Bulk import a file
- Start timer
- Set up a BatchScanner with one of the iterator stacks and use it to
query for lots of different ranges
- Iterate through the results of this
- Stop timer

Repeat with the other implementation of the iterators.

Of course, the difference in performance may not be measurable, if the
time is dominated by the disk-seek time, but that would still be useful
information. And the absolute performance wouldn't be representative of
what you'd get on a real cluster as there's no network latency in these
trials, but that's fine as I'm mainly interested in which of the two
implementations of the iterators is most performant.

Similarly, could the same approach be used to compare the performance on
SSD vs hard disk?

Thanks,

Dave.

Re: Mini Accumulo cluster

Reply via email to