On 11/8/13, 2:53 PM, Slater, David M. wrote:
Hi all,
I have an 8-node cluster (1 name node, 7 data nodes), running accumulo
1.4.2, zookeeper 3.3.6, and hadoop 1.0.3, and I have it optimized for
ingest performance. My question has to do is how to make the performance
degrade gracefully under node failure.
1) When nodes fail, I assume that what happens is that Accumulo needs to
migrate those tablets, and hadoop needs to replicate the underlying data
blocks. This seems to have a rather catastrophic effect on ingest rates.
Is there a way to make more gradually migrate tablets (starting with
more active ones) and replicate data blocks in order to not interfere
with ingestion as severely?
First, the master needs to notice that a TabletServer died (via
ZooKeeper lock). This will take up to 30 seconds if you haven't
configured a more aggressive default timeout using
`instance.zookeeper.timeout`. In practice, I think I normally see it
take <10seconds for a failure to occur. Next, the Master will reassign
the tables hosted by this now failed tserver out to new tserver(s).
Perhaps the Master could sort the tablets to make reassignment happen
faster, but I would guess that when the new TabletServer tries to bring
the tablet online and has to perform recovery (as is the case for you
with active ingest), this would trump the amount of time for the master
to request the tablets to be brought online.
Ultimately, I would guess this is a balancing act between how large you
configure the in-memory maps which should speed up ingest, and how long
the penalty for recovery is when the amount of data you have to recover
is much larger.
2) What happens to BatchWriters when a tablet server fails that it is
attempting to write to? Will I need to start catching MutationRejected
exceptions, will it block, or is there some other failure mode?
The BatchWriter will block/retry these mutations. You shouldn't have to
do anything special to handle TabletServer failure at the BatchWriter level.
3) This I believe is a separate issue from node failure, but I was
seeing some very odd zookeeper behavior, involving a number of timeouts.
I currently have zookeeper running on all 7 data nodes, with the
batchwriters running on the name node. Basically, I was getting a number
of the following:
client session timed out …
opening socket connection
socket connection established
session establishment complete
…
client session timed out …
repeat
This may be normal zookeeper operations. As the session times out, if
the client is still there, it will renew. I'm not a zookeeper expert though.
I would also occasionally get
session expired for /accumulo/fe7…
as well as
Zookeper.KeeperException$Connectionloss
Exception: KeeperErrorCode = Connectionloss
for /accumulo/f37…/tables/3b/state
at accumulo.core.zookeeper.ZooCache$2.run
accumulo.core.zookeeper.ZooCache.retry
accumulo.core.zookeeper.ZooCach.get
core.clientimpl.tables.getTableState
core.clientimpl.multiTableBatchWriter.getBatchWriter
myIngestorProcess.run
I'm guessing your "myIngestorProcess" doesn't actually fail, does it?
Again, I'm guessing "normal" operations, although things like
maxClientCnxns in zoo.cfg can influence this.
Does anyone know if this is an Accumulo problem, a Zookeeper problem, or
something else (network overly busy, etc.)?
Thanks,
Dvaid