Re: deployment architecture

Josh Elser Sun, 05 Jan 2014 10:10:09 -0800

On 1/5/14, 12:44 PM, Arshak Navruzyan wrote:

Is there a document that describes best practices for Accumulo deployments?

I'm guessing the Accumulo user-manual[1] covers some of this, but I'mnot positive.

In particular:

1.  Should you run Accumulo on HD data nodes and name nodes? (Is
enabling HDFS short-circuit local reads a good idea?)

Datanodes and tasktrackers/nodemanagers, yes. I wouldn't run it on theNamenode though.

2.  If so do you disable map/reduce for nodes that run Accumulo tservers?

With conscious awareness of your resource allocation (make sure thereare still physical resources for Accumulo) this should be fine, but becareful if you're running a heavy M/R load.

3.  Is auto-splitting (by size) done in the real world or do most real
apps have pre-set split points?

Adding some split points is probably always a good idea. Making sureeach tabletserver has at least a few tablets for your table is good,after that, you can increase the size of the split threshold (default is1GB) for that table so you get a good distribution of tablets/tserversfor the amount of data you're storing (100-200 tablets is a goodtarget). The splits themselves obviously depend on your data, though.

4.  Do you let Accumulo decide when to flush and compact or do people
write these into their apps (based on their knowledge of app behavior)

Unless you have retention policies which are stringent upon data beingphysically removed from disk (as opposed to not visible throughAccumulo's API), I'm not coming up with a reason that you would have toautomate flush/compact. If you're doing data age-off (e.g. keeping Nmonths of data, and rolling off the oldest day of data each data), it'sprobably not a bad idea to just do a range compaction on that old day toclean it up before your users are hitting your system full swing.

I know the generic answer is "it all depends on your app/workload" but
if anyone wants to still describe their environment it would be helpful.

Thanks.


[1] http://accumulo.apache.org/1.5/accumulo_user_manual.html

Re: deployment architecture

Reply via email to