Re: Performance during node failure

Josh Elser Fri, 08 Nov 2013 12:11:45 -0800

On 11/8/13, 2:53 PM, Slater, David M. wrote:

Hi all,


I have an 8-node cluster (1 name node, 7 data nodes), running accumulo
1.4.2, zookeeper 3.3.6, and hadoop 1.0.3, and I have it optimized for
ingest performance. My question has to do is how to make the performance
degrade gracefully under node failure.

1) When nodes fail, I assume that what happens is that Accumulo needs to
migrate those tablets, and hadoop needs to replicate the underlying data
blocks. This seems to have a rather catastrophic effect on ingest rates.
Is there a way to make more gradually migrate tablets (starting with
more active ones) and replicate data blocks in order to not interfere
with ingestion as severely?

First, the master needs to notice that a TabletServer died (viaZooKeeper lock). This will take up to 30 seconds if you haven'tconfigured a more aggressive default timeout using`instance.zookeeper.timeout`. In practice, I think I normally see ittake <10seconds for a failure to occur. Next, the Master will reassignthe tables hosted by this now failed tserver out to new tserver(s).Perhaps the Master could sort the tablets to make reassignment happenfaster, but I would guess that when the new TabletServer tries to bringthe tablet online and has to perform recovery (as is the case for youwith active ingest), this would trump the amount of time for the masterto request the tablets to be brought online.

Ultimately, I would guess this is a balancing act between how large youconfigure the in-memory maps which should speed up ingest, and how longthe penalty for recovery is when the amount of data you have to recoveris much larger.

2) What happens to BatchWriters when a tablet server fails that it is
attempting to write to? Will I need to start catching MutationRejected
exceptions, will it block, or is there some other failure mode?

The BatchWriter will block/retry these mutations. You shouldn't have todo anything special to handle TabletServer failure at the BatchWriter level.

3) This I believe is a separate issue from node failure, but I was
seeing some very odd zookeeper behavior, involving a number of timeouts.
I currently have zookeeper running on all 7 data nodes, with the
batchwriters running on the name node. Basically, I was getting a number
of the following:

client session timed out …

opening socket connection

socket connection established

session establishment complete

…

client session timed out …

repeat

This may be normal zookeeper operations. As the session times out, ifthe client is still there, it will renew. I'm not a zookeeper expert though.


I would also occasionally get

session expired for /accumulo/fe7…

as well as

Zookeper.KeeperException$Connectionloss

Exception: KeeperErrorCode = Connectionloss

for /accumulo/f37…/tables/3b/state

at accumulo.core.zookeeper.ZooCache$2.run

accumulo.core.zookeeper.ZooCache.retry

accumulo.core.zookeeper.ZooCach.get

core.clientimpl.tables.getTableState

core.clientimpl.multiTableBatchWriter.getBatchWriter

myIngestorProcess.run

I'm guessing your "myIngestorProcess" doesn't actually fail, does it?Again, I'm guessing "normal" operations, although things likemaxClientCnxns in zoo.cfg can influence this.


Does anyone know if this is an Accumulo problem, a Zookeeper problem, or
something else (network overly busy, etc.)?

Thanks,
Dvaid

Re: Performance during node failure

Reply via email to