Accumulo Caching for benchmarking

Steven Troxell Fri, 03 Aug 2012 14:51:04 -0700

Hi  all,

I am running a benchmarking project on accumulo looking at RDF queries for
clusters with different node sizes.   While I intend to look at caching for
each optimizing each individual run, I do NOT want caching to interfere for
example between runs involving the use of 10 and 8 tablet servers.


Up to now I'd just been killing nodes via the bin/stop-here.sh script but I
realize that may have allowed caching from previous runs with different
node sizes to influence my results.   It seemed weird to me for exmaple
when I realized dropping nodes actually increased performance (as measured
by query return times) in some cases (though I acknowledge the code I'm
working with has some serious issues with how ineffectively it is actually
utilizing accumulo, but that's an issue I intend to address later).

I suppose one way would be between a change of node sizes,  stop and
restart ALL nodes ( as opposed to what I'd been doing in just killing 2
nodes for example in transitioning from a 10 to 8 node test).  Will this be
sure to clear the influence of caching across runs, and is there any
cleaner way to do this?

thanks,
Steve

Accumulo Caching for benchmarking

Reply via email to