Are there other considerations I should be aware of to ensure independent runs outside of stopping/restarting tablet servers and clearing OS cache?
I ran a test with 2 tablet servers active, got 1 query to come back in 10 hours. Ran /bin/stop-all and ./bin/start-all to get a comparison test with 10 tservers, cleared the cache using Eric's command on the 2 tablet servers I had used for the first run before, and now I already had 4 queries return in under 2 minutes. This could be awesome peformance gains, but I'm a bit skeptical, especially considering the client code isn't even using batchscans (as well as assorted other inefficiencies). Is there some other dependency between the tests I haven't accounted for? On Mon, Aug 6, 2012 at 2:41 PM, Steven Troxell <[email protected]>wrote: > For anyone else curious about this, it seems the OS caching played a much > larger role for me then TServer caching. I actually measured performance > increase after just stopping/restarting TServers to clear cache. (could > also have been biased by being a weekend run on the cluster). > > However I noticed immediate difference when clearing the OS caching > through Eric's commands, the first few querys that had generally been > returning in tenths of seconds, were now up in the minutes range. > > > > > On Sat, Aug 4, 2012 at 1:21 PM, Steven Troxell > <[email protected]>wrote: > >> thanks everyone, that should definately help me out, while I feel silly >> for ignoring this issue at first, it should be interesting to see how much >> this influences the results. >> >> >> >> On Sat, Aug 4, 2012 at 7:19 AM, Eric Newton <[email protected]>wrote: >> >>> You can drop the OS caches between runs: >>> >>> # echo 1 > /proc/sys/vm/drop_caches >>> >>> >>> On Fri, Aug 3, 2012 at 9:41 PM, Christopher Tubbs <[email protected]>wrote: >>> >>>> Steve- >>>> >>>> I would probably design the experiment to test different cluster sizes >>>> as completely independent. That means, taking the entire thing down >>>> and back up again (possibly even rebooting the boxes, and/or >>>> re-initializing the cluster at the new size). I'd also do several runs >>>> while it is up at a particular cluster size, to capture any >>>> performance difference between the first and a later run due to OS or >>>> TServer caching, for analysis later. >>>> >>>> Essentially, when in doubt, take more data... >>>> >>>> --L >>>> >>>> >>>> On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell < >>>> [email protected]> wrote: >>>> > Hi all, >>>> > >>>> > I am running a benchmarking project on accumulo looking at RDF >>>> queries for >>>> > clusters with different node sizes. While I intend to look at >>>> caching for >>>> > each optimizing each individual run, I do NOT want caching to >>>> interfere for >>>> > example between runs involving the use of 10 and 8 tablet servers. >>>> > >>>> > Up to now I'd just been killing nodes via the bin/stop-here.sh script >>>> but I >>>> > realize that may have allowed caching from previous runs with >>>> different node >>>> > sizes to influence my results. It seemed weird to me for exmaple >>>> when I >>>> > realized dropping nodes actually increased performance (as measured >>>> by query >>>> > return times) in some cases (though I acknowledge the code I'm >>>> working with >>>> > has some serious issues with how ineffectively it is actually >>>> utilizing >>>> > accumulo, but that's an issue I intend to address later). >>>> > >>>> > I suppose one way would be between a change of node sizes, stop and >>>> restart >>>> > ALL nodes ( as opposed to what I'd been doing in just killing 2 nodes >>>> for >>>> > example in transitioning from a 10 to 8 node test). Will this be >>>> sure to >>>> > clear the influence of caching across runs, and is there any cleaner >>>> way to >>>> > do this? >>>> > >>>> > thanks, >>>> > Steve >>>> >>> >>> >> >
