Re: virtualize accumulo?

Josh Elser Tue, 05 Nov 2013 12:32:38 -0800

Hi Kesten,

As you likely know (given your arguments against), using virtualizationto a Hadoop stack can introduce some unintended consequences. Hadoop hasa lot of heartbeats between processes to determine system "aliveness".If your infrastructure is overloaded, Hadoop can really suffer fromspikes in latency.

Accumulo is much the same way, arguably a bit more. Accumulo's processesare very dependent on maintaining a lock in ZooKeeper (every 30 secondsby default) instead of RPC calls between DataNodes and NameNodes.Accumulo's node failure tends to be much more expensive than HDFS'because Accumulo wants to make sure every tablet is available withoutsignificant downtime. Hadoop has multiple replicas for each file so itcan be a bit more lazy about noticing failure and re-replicating. WhatI've typically heard is that running Accumulo in a virtualizedenvironment makes administration and use a bit more difficult.

If you're considering running HDFS on baremetal, I would encourage youdo to the same with Accumulo or investigate something like YARN (really,HOYA https://github.com/hortonworks/hoya/) to do dynamic provisioning.Accumulo has the ability to happily scale and run across many nodes, soyou shouldn't have to worry about large installation problems (in otherwords: one Accumulo instance should be sufficient for a cluster).YARN/HOYA gives you the dynamic allocations on top of your cluster tohave the ease of spinning up and down Accumulo clusters as you want/needthem.


On 11/5/13, 3:21 PM, Kesten Broughton wrote:

I've seen arguments both for and against virtualizing hadoop/hdfs.
(the arguments for were from vmware :)

We are considering hdfs on baremetal, with accumulo being virtualized.
This would serve a fairly constant amount of data but widely varying compute 
demands.
Has anyone tried this?  Can anyone share their experience with 
baremetal/virtualization with accumulo?

thanks

kesten
(first post)

Re: virtualize accumulo?

Reply via email to