Fwd: why compaction failure on one table brings other tables offline, how to recover

Billie Rinaldi Fri, 08 Apr 2016 07:41:20 -0700

*From:* Jayesh Patel
*Sent:* Thursday, April 07, 2016 4:36 PM
*To:* '[email protected]' <[email protected]>
*Subject:* RE: why compaction failure on one table brings other tables
offline, how to recover




I have a 3 node Accumulo 1.7 cluster with a few small tables (few MB in
size at most).



I had one of those table fail minc because I had configured a
SummingCombiner with FIXEDLEN but had smaller values:

MinC failed (trying to convert to long, but byte array isn't long enough,
wanted 8 found 1) to create
hdfs://instance-accumulo:8020/accumulo/tables/1/default_tablet/F0002bcs.rf_tmp
retrying ...



I have learned since to set the ‘lossy’ parameter to true to avoid this.  *Why
is the default value for it false* if it can cause catastrophic failure
that you’ll read about ahead.



However, this brought other the tablets for other tables offline without
any apparent errors or warnings.  *Can someone please explain why?*




In order to recover from this, I did a ‘droptable’ from the shell on the
affected tables, but they all got stuck in the ‘DELETING’ state.  I was
able to finally delete them using zkcli ‘rmr’ command.  *Is there a better
way?*

I’m assuming there is a more proper way because when I created the tables
again (with the same name), they went back to having a single offline
tablet right away.  *Is this because there are “traces” of the old table
left behind that affect the new table even though the new table has a
different table id?*  I ended up wiping out hdfs and recreating the
accumulo instance.



It seems that a small bug, writing 1 byte value instead of 8 bytes, caused
us to dump the whole accumulo instance.  Luckily the data wasn’t that
important, but this whole episode makes us wonder why doing things the
right way (assuming there is a right way) wasn’t obvious or if Accumulo is
just very fragile.



Please ask away any questions/clarification you might have.  We’ll
appreciate any input you might have so we make educated decisions about
using Accumulo going forward.



Thank you,

Jayesh

Fwd: why compaction failure on one table brings other tables offline, how to recover

Reply via email to