You can drop the OS caches between runs: # echo 1 > /proc/sys/vm/drop_caches
On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[email protected]>wrote: > Steve- > > I would probably design the experiment to test different cluster sizes > as completely independent. That means, taking the entire thing down > and back up again (possibly even rebooting the boxes, and/or > re-initializing the cluster at the new size). I'd also do several runs > while it is up at a particular cluster size, to capture any > performance difference between the first and a later run due to OS or > TServer caching, for analysis later. > > Essentially, when in doubt, take more data... > > --L > > > On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[email protected]> > wrote: > > Hi all, > > > > I am running a benchmarking project on accumulo looking at RDF queries > for > > clusters with different node sizes. While I intend to look at caching > for > > each optimizing each individual run, I do NOT want caching to interfere > for > > example between runs involving the use of 10 and 8 tablet servers. > > > > Up to now I'd just been killing nodes via the bin/stop-here.sh script > but I > > realize that may have allowed caching from previous runs with different > node > > sizes to influence my results. It seemed weird to me for exmaple when I > > realized dropping nodes actually increased performance (as measured by > query > > return times) in some cases (though I acknowledge the code I'm working > with > > has some serious issues with how ineffectively it is actually utilizing > > accumulo, but that's an issue I intend to address later). > > > > I suppose one way would be between a change of node sizes, stop and > restart > > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for > > example in transitioning from a 10 to 8 node test). Will this be sure to > > clear the influence of caching across runs, and is there any cleaner way > to > > do this? > > > > thanks, > > Steve >
