BTW, I also found a good description of a "typical cluster" in the upcoming O'Reilly Accumulo book: http://shop.oreilly.com/product/0636920032304.do
On Sun, Jan 5, 2014 at 10:08 AM, Josh Elser <[email protected]> wrote: > On 1/5/14, 12:44 PM, Arshak Navruzyan wrote: > >> Is there a document that describes best practices for Accumulo >> deployments? >> > > I'm guessing the Accumulo user-manual[1] covers some of this, but I'm not > positive. > > > In particular: >> >> 1. Should you run Accumulo on HD data nodes and name nodes? (Is >> enabling HDFS short-circuit local reads a good idea?) >> > > Datanodes and tasktrackers/nodemanagers, yes. I wouldn't run it on the > Namenode though. > > > 2. If so do you disable map/reduce for nodes that run Accumulo tservers? >> > > With conscious awareness of your resource allocation (make sure there are > still physical resources for Accumulo) this should be fine, but be careful > if you're running a heavy M/R load. > > > 3. Is auto-splitting (by size) done in the real world or do most real >> apps have pre-set split points? >> > > Adding some split points is probably always a good idea. Making sure each > tabletserver has at least a few tablets for your table is good, after that, > you can increase the size of the split threshold (default is 1GB) for that > table so you get a good distribution of tablets/tservers for the amount of > data you're storing (100-200 tablets is a good target). The splits > themselves obviously depend on your data, though. > > > 4. Do you let Accumulo decide when to flush and compact or do people >> write these into their apps (based on their knowledge of app behavior) >> > > Unless you have retention policies which are stringent upon data being > physically removed from disk (as opposed to not visible through Accumulo's > API), I'm not coming up with a reason that you would have to automate > flush/compact. If you're doing data age-off (e.g. keeping N months of data, > and rolling off the oldest day of data each data), it's probably not a bad > idea to just do a range compaction on that old day to clean it up before > your users are hitting your system full swing. > > > I know the generic answer is "it all depends on your app/workload" but >> if anyone wants to still describe their environment it would be helpful. >> >> Thanks. >> > > [1] http://accumulo.apache.org/1.5/accumulo_user_manual.html >
