On 11/8/13, 2:53 PM, Slater, David M. wrote:
Hi all,

I have an 8-node cluster (1 name node, 7 data nodes), running accumulo
1.4.2, zookeeper 3.3.6, and hadoop 1.0.3, and I have it optimized for
ingest performance. My question has to do is how to make the performance
degrade gracefully under node failure.

1) When nodes fail, I assume that what happens is that Accumulo needs to
migrate those tablets, and hadoop needs to replicate the underlying data
blocks. This seems to have a rather catastrophic effect on ingest rates.
Is there a way to make more gradually migrate tablets (starting with
more active ones) and replicate data blocks in order to not interfere
with ingestion as severely?

First, the master needs to notice that a TabletServer died (via ZooKeeper lock). This will take up to 30 seconds if you haven't configured a more aggressive default timeout using `instance.zookeeper.timeout`. In practice, I think I normally see it take <10seconds for a failure to occur. Next, the Master will reassign the tables hosted by this now failed tserver out to new tserver(s). Perhaps the Master could sort the tablets to make reassignment happen faster, but I would guess that when the new TabletServer tries to bring the tablet online and has to perform recovery (as is the case for you with active ingest), this would trump the amount of time for the master to request the tablets to be brought online.

Ultimately, I would guess this is a balancing act between how large you configure the in-memory maps which should speed up ingest, and how long the penalty for recovery is when the amount of data you have to recover is much larger.

2) What happens to BatchWriters when a tablet server fails that it is
attempting to write to? Will I need to start catching MutationRejected
exceptions, will it block, or is there some other failure mode?

The BatchWriter will block/retry these mutations. You shouldn't have to do anything special to handle TabletServer failure at the BatchWriter level.

3) This I believe is a separate issue from node failure, but I was
seeing some very odd zookeeper behavior, involving a number of timeouts.
I currently have zookeeper running on all 7 data nodes, with the
batchwriters running on the name node. Basically, I was getting a number
of the following:

client session timed out …

opening socket connection

socket connection established

session establishment complete

…

client session timed out …

repeat

This may be normal zookeeper operations. As the session times out, if the client is still there, it will renew. I'm not a zookeeper expert though.


I would also occasionally get

session expired for /accumulo/fe7…

as well as

Zookeper.KeeperException$Connectionloss

Exception: KeeperErrorCode = Connectionloss

for /accumulo/f37…/tables/3b/state

at accumulo.core.zookeeper.ZooCache$2.run

accumulo.core.zookeeper.ZooCache.retry

accumulo.core.zookeeper.ZooCach.get

core.clientimpl.tables.getTableState

core.clientimpl.multiTableBatchWriter.getBatchWriter

myIngestorProcess.run

I'm guessing your "myIngestorProcess" doesn't actually fail, does it? Again, I'm guessing "normal" operations, although things like maxClientCnxns in zoo.cfg can influence this.


Does anyone know if this is an Accumulo problem, a Zookeeper problem, or
something else (network overly busy, etc.)?

Thanks,
Dvaid

Reply via email to