Re: Accumulo Caching for benchmarking

Josh Elser Fri, 03 Aug 2012 18:00:09 -0700

I remember listening to a Keith presentation about the testing againstthe multi-level RFile index which was introduced in 1.4.0.

You also want to think about caching at the operating system level. I'mnot entirely positive what Keith did to try to mitigate this, but Iimagine writing a bunch of garbage from /dev/urandom out to disk shouldwork. That, or you could actually reboot the nodes.


On 8/3/2012 8:55 PM, William Slacum wrote:

Steve, I'm a little confused. The Rfile block cache is tied to a
TServer, so if you kill a node, its cache should go away. Are you
querying for the same data after you kill the node that hosted the
tablet which contained the data? Also, between runs, you could stop and
restart everything, thereby eliminating the cache.

On Fri, Aug 3, 2012 at 5:50 PM, Steven Troxell <[email protected]
<mailto:[email protected]>> wrote:

    Hi  all,

    I am running a benchmarking project on accumulo looking at RDF
    queries for clusters with different node sizes.   While I intend to
    look at caching for each optimizing each individual run, I do NOT
    want caching to interfere for example between runs involving the use
    of 10 and 8 tablet servers.

    Up to now I'd just been killing nodes via the bin/stop-here.sh
    script but I realize that may have allowed caching from previous
    runs with different node sizes to influence my results.   It seemed
    weird to me for exmaple when I realized dropping nodes actually
    increased performance (as measured by query return times) in some
    cases (though I acknowledge the code I'm working with has some
    serious issues with how ineffectively it is actually utilizing
    accumulo, but that's an issue I intend to address later).

    I suppose one way would be between a change of node sizes,  stop and
    restart ALL nodes ( as opposed to what I'd been doing in just
    killing 2 nodes for example in transitioning from a 10 to 8 node
    test).  Will this be sure to clear the influence of caching across
    runs, and is there any cleaner way to do this?

    thanks,
    Steve

Re: Accumulo Caching for benchmarking

Reply via email to