*From:* Jayesh Patel *Sent:* Thursday, April 07, 2016 4:36 PM *To:* '[email protected]' <[email protected]> *Subject:* RE: why compaction failure on one table brings other tables offline, how to recover
I have a 3 node Accumulo 1.7 cluster with a few small tables (few MB in size at most). I had one of those table fail minc because I had configured a SummingCombiner with FIXEDLEN but had smaller values: MinC failed (trying to convert to long, but byte array isn't long enough, wanted 8 found 1) to create hdfs://instance-accumulo:8020/accumulo/tables/1/default_tablet/F0002bcs.rf_tmp retrying ... I have learned since to set the ‘lossy’ parameter to true to avoid this. *Why is the default value for it false* if it can cause catastrophic failure that you’ll read about ahead. However, this brought other the tablets for other tables offline without any apparent errors or warnings. *Can someone please explain why?* In order to recover from this, I did a ‘droptable’ from the shell on the affected tables, but they all got stuck in the ‘DELETING’ state. I was able to finally delete them using zkcli ‘rmr’ command. *Is there a better way?* I’m assuming there is a more proper way because when I created the tables again (with the same name), they went back to having a single offline tablet right away. *Is this because there are “traces” of the old table left behind that affect the new table even though the new table has a different table id?* I ended up wiping out hdfs and recreating the accumulo instance. It seems that a small bug, writing 1 byte value instead of 8 bytes, caused us to dump the whole accumulo instance. Luckily the data wasn’t that important, but this whole episode makes us wonder why doing things the right way (assuming there is a right way) wasn’t obvious or if Accumulo is just very fragile. Please ask away any questions/clarification you might have. We’ll appreciate any input you might have so we make educated decisions about using Accumulo going forward. Thank you, Jayesh
