Known issue in the release notes on the web page? We would have to update every version though. Seems like we need a known issues document that lists issues in dependencies that transcend Accumulo versions.
----- Original Message ----- From: "Josh Elser" <josh.el...@gmail.com> To: d...@accumulo.apache.org Cc: user@accumulo.apache.org Sent: Wednesday, September 23, 2015 10:26:50 AM Subject: Re: [ADVISORY] Possible data loss during HDFS decommissioning What kind of documentation can we put in the user manual about this? Recommend to only decom one rack at a time until we get the issue sorted out in Hadoop-land? dlmar...@comcast.net wrote: > BLUF: There exists the possibility of data loss when performing DataNode > decommissioning with Accumulo running. This note applies to installations of > Accumulo 1.5.0+ and Hadoop 2.5.0+. > > DETAILS: During DataNode decommissioning it is possible for the NameNode to > report stale block locations (HDFS-8208). If Accumulo is running during this > process then it is possible that files currently being written will not close > properly. Accumulo is affected in two ways: > > 1. During compactions temporary rfiles are created, then closed, and renamed. > If a failure happens during the close, the compaction will fail. > 2. Write ahead log files are created, written to, and then closed. If a > failure happens during the close, then the NameNode will have a walog file > with no finalized blocks. > > If either of these cases happen, decommissioning of the DataNode could hang > (HDFS-3599, HDFS-5579) because the files are left in an open for write state. > If Accumulo needs the write ahead log for recovery it will be unable to read > the file and will not recover. > > RECOMMENDATION: Assuming that the replication pipeline for the write ahead > log is working properly, then you should not run into this issue if you only > decommission one rack at a time. >