Hi all, I am running a benchmarking project on accumulo looking at RDF queries for clusters with different node sizes. While I intend to look at caching for each optimizing each individual run, I do NOT want caching to interfere for example between runs involving the use of 10 and 8 tablet servers.
Up to now I'd just been killing nodes via the bin/stop-here.sh script but I realize that may have allowed caching from previous runs with different node sizes to influence my results. It seemed weird to me for exmaple when I realized dropping nodes actually increased performance (as measured by query return times) in some cases (though I acknowledge the code I'm working with has some serious issues with how ineffectively it is actually utilizing accumulo, but that's an issue I intend to address later). I suppose one way would be between a change of node sizes, stop and restart ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes for example in transitioning from a 10 to 8 node test). Will this be sure to clear the influence of caching across runs, and is there any cleaner way to do this? thanks, Steve
