[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2019-03-01 Thread Stephen O'Donnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781637#comment-16781637
 ] 

Stephen O'Donnell commented on HDFS-9908:
-

Even with HADOOP-12973 I have seen an instance of this problem occur again, in 
a slightly different part of the code, but I think for the same reason - a disk 
error during the handshake process.

These are the cut down logs:

{code}
2019-03-01 08:58:24,830 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning 
block pool BP-240961797-10.9.65.12-1392827522027 on volume 
/data/18/dfs/dn/current...
...
2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could 
not get disk usage information
ExitCodeException exitCode=1: du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-10.9.65.12-1392827522027/current/finalized/subdir149/subdir215':
 Permission denied
du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-10.9.65.12-1392827522027/current/finalized/subdir149/subdir213':
 Permission denied
du: cannot read directory 
`/data/18/dfs/dn/current/BP-240961797-10.9.65.12-1392827522027/current/finalized/subdir97/subdir25':
 Permission denied

at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
at org.apache.hadoop.util.Shell.run(Shell.java:504)
at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61)
at org.apache.hadoop.fs.DU.refresh(DU.java:53)
at 
org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84)
at 
org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.(BlockPoolSlice.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412)
...
2019-03-01 08:58:27,043 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken 
to scan block pool BP-240961797-10.9.65.12-1392827522027 on 
/data/18/dfs/dn/current: 2202ms
{code}

So we can see a du error occurred, was logged (due to HADOOP-12973) and the 
blockpool scan completed. However then in the 'add replicas to map' logic, we 
got another exception stemming from the same problem:

{code}
2019-03-01 08:58:27,564 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding 
replicas to map for block pool BP-240961797-10.9.65.12-1392827522027 on volume 
/data/18/dfs/dn/current...
...
2019-03-01 08:58:31,155 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught 
exception while adding replicas from /data/18/dfs/dn/current. Will throw later.
java.io.IOException: Invalid directory or I/O error occurred for dir: 
/data/18/dfs/dn/current/BP-240961797-10.9.65.12-1392827522027/current/finalized/subdir149/subdir215
at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:861)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)

< The message 2019-03-01 08:59:00,989 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to 
add replicas to map for block pool BP-240961797-10.9.65.12-1392827522027 on 
volume xxx did not appear for this volume as it failed >
{code}

I believe this one is then thrown by the same logic that would have thrown the 
original du exception if it was not caught and then the same problem occurs. 
The DN tries to add all the volumes again, finds them locked and exits with a 
'all volumes failed' error.

I believe this error is coming from:

DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> 
FSVolumeList.getAllVolumesMap -> Throws exception

While with the original "du" issue which no longer happens, the error path was:

DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> 
FSVolumeList.addBlockPool -> Throws exception

This occurred on CDH 5.9.1, but a quick check of the current trunk suggests the 
code path is pretty much the same in this area.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-05-19 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292208#comment-15292208
 ] 

Wei-Chiu Chuang commented on HDFS-9908:
---

Interestingly, after HADOOP-12973, the exception thrown by du will be caught 
and logged. That is to say, NN handshake will not be disrupt by this exception.

{code}
@Override
  protected synchronized void refresh() {
if (duShell == null) {
  duShell = new DUShell();
}
try {
  duShell.startRefresh();
} catch (IOException ioe) {
  LOG.warn("Could not get disk usage information", ioe);
}
  }
{code}

Hiding a potential disk error in the log may not be the best option, IMO.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
> Environment: CDH5.3.3
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, 
> HDFS-9908.003.patch, HDFS-9908.004.patch, HDFS-9908.005.patch, 
> HDFS-9908.006.patch, HDFS-9908.007.patch
>
>
> DN may treat a disk scan failure exception as an NN handshake exception, and 
> this can prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>* One of the Block Pools has successfully connected to its NN.
>* This initializes the local storage for that block pool,
>* checks consistency of the NN's cluster ID, etc.
>* 
>* If this is the first block pool to register, this also initializes
>* the datanode-scoped storage.
>* 
>* @param bpos Block pool offer service
>* @throws IOException if the NN is inconsistent with the local storage.
>*/
>   void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
>   throw new IOException("NamespaceInfo not found: Block pool " + bpos
>   + " should have retrieved namespace info before initBlockPool.");
> }
> 
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
> 
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid 
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <- this line 
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
> long totalStartTime = Time.monotonicNow();
> 
> final List exceptions = Collections.synchronizedList(
> new ArrayList());
> List blockPoolAddingThreads = new ArrayList();
> for (final FsVolumeImpl v : volumes) {
>   Thread t = new Thread() {
> public void run() {
>   try (FsVolumeReference ref = v.obtainReference()) {
> FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
> " on volume " + v + "...");
> long startTime = Time.monotonicNow();
> v.addBlockPool(bpid, conf);
> long timeTaken = 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-05-12 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281774#comment-15281774
 ] 

Wei-Chiu Chuang commented on HDFS-9908:
---

Need to rebase due to HADOOP-12973.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
> Environment: CDH5.3.3
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, 
> HDFS-9908.003.patch, HDFS-9908.004.patch, HDFS-9908.005.patch, 
> HDFS-9908.006.patch, HDFS-9908.007.patch
>
>
> DN may treat a disk scan failure exception as an NN handshake exception, and 
> this can prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>* One of the Block Pools has successfully connected to its NN.
>* This initializes the local storage for that block pool,
>* checks consistency of the NN's cluster ID, etc.
>* 
>* If this is the first block pool to register, this also initializes
>* the datanode-scoped storage.
>* 
>* @param bpos Block pool offer service
>* @throws IOException if the NN is inconsistent with the local storage.
>*/
>   void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
>   throw new IOException("NamespaceInfo not found: Block pool " + bpos
>   + " should have retrieved namespace info before initBlockPool.");
> }
> 
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
> 
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid 
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <- this line 
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
> long totalStartTime = Time.monotonicNow();
> 
> final List exceptions = Collections.synchronizedList(
> new ArrayList());
> List blockPoolAddingThreads = new ArrayList();
> for (final FsVolumeImpl v : volumes) {
>   Thread t = new Thread() {
> public void run() {
>   try (FsVolumeReference ref = v.obtainReference()) {
> FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
> " on volume " + v + "...");
> long startTime = Time.monotonicNow();
> v.addBlockPool(bpid, conf);
> long timeTaken = Time.monotonicNow() - startTime;
> FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
> " on " + v + ": " + timeTaken + "ms");
>   } catch (ClosedChannelException e) {
> // ignore.
>   } catch (IOException ioe) {
> FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
> ". Will throw later.", ioe);
> exceptions.add(ioe);
>   }
> }
>   };
>

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219013#comment-15219013
 ] 

Hadoop QA commented on HDFS-9908:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
54s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 55s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 48s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 53s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 8s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 27s 
{color} | {color:red} Patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 215m 5s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-28 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214323#comment-15214323
 ] 

Wei-Chiu Chuang commented on HDFS-9908:
---

[~eddyxu] thanks for the comments.
If we throw the IOE in that case, that means other disk corruption error might 
also fail NN handshake. (there are a few places in 
{{BlockPoolSlice#(constructor)}} where it throw IOException if it fails to 
create directories.

On the other hand, I think it would be inappropriate to ignore these failures. 
If we do not have a consistent failure tolerance mechanism in place, I agree 
throwing an IOException seems to be a slightly better approach.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
> Environment: CDH5.3.3
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, 
> HDFS-9908.003.patch, HDFS-9908.004.patch, HDFS-9908.005.patch
>
>
> DN may treat a disk scan failure exception as an NN handshake exception, and 
> this can prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>* One of the Block Pools has successfully connected to its NN.
>* This initializes the local storage for that block pool,
>* checks consistency of the NN's cluster ID, etc.
>* 
>* If this is the first block pool to register, this also initializes
>* the datanode-scoped storage.
>* 
>* @param bpos Block pool offer service
>* @throws IOException if the NN is inconsistent with the local storage.
>*/
>   void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
>   throw new IOException("NamespaceInfo not found: Block pool " + bpos
>   + " should have retrieved namespace info before initBlockPool.");
> }
> 
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
> 
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid 
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <- this line 
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
> long totalStartTime = Time.monotonicNow();
> 
> final List exceptions = Collections.synchronizedList(
> new ArrayList());
> List blockPoolAddingThreads = new ArrayList();
> for (final FsVolumeImpl v : volumes) {
>   Thread t = new Thread() {
> public void run() {
>   try (FsVolumeReference ref = v.obtainReference()) {
> FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
> " on volume " + v + "...");
> long startTime = Time.monotonicNow();
> v.addBlockPool(bpid, conf);
> long timeTaken = Time.monotonicNow() - startTime;
> FsDatasetImpl.LOG.info("Time 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-25 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212469#comment-15212469
 ] 

Lei (Eddy) Xu commented on HDFS-9908:
-

Thanks a lot for updating the patch, [~jojochuang].

One qq in {{handleAddBlockError()}}, should we throw the {{IOE}} if 
{{removeCandidates.size() < unhealthyDataDirs.size()}}.

The rest LTGM.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
> Environment: CDH5.3.3
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, 
> HDFS-9908.003.patch, HDFS-9908.004.patch, HDFS-9908.005.patch
>
>
> DN may treat a disk scan failure exception as an NN handshake exception, and 
> this can prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>* One of the Block Pools has successfully connected to its NN.
>* This initializes the local storage for that block pool,
>* checks consistency of the NN's cluster ID, etc.
>* 
>* If this is the first block pool to register, this also initializes
>* the datanode-scoped storage.
>* 
>* @param bpos Block pool offer service
>* @throws IOException if the NN is inconsistent with the local storage.
>*/
>   void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
>   throw new IOException("NamespaceInfo not found: Block pool " + bpos
>   + " should have retrieved namespace info before initBlockPool.");
> }
> 
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
> 
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid 
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <- this line 
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of exception.
> {code:title=FsVolumeList.java}
>   void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
> long totalStartTime = Time.monotonicNow();
> 
> final List exceptions = Collections.synchronizedList(
> new ArrayList());
> List blockPoolAddingThreads = new ArrayList();
> for (final FsVolumeImpl v : volumes) {
>   Thread t = new Thread() {
> public void run() {
>   try (FsVolumeReference ref = v.obtainReference()) {
> FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
> " on volume " + v + "...");
> long startTime = Time.monotonicNow();
> v.addBlockPool(bpid, conf);
> long timeTaken = Time.monotonicNow() - startTime;
> FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
> " on " + v + ": " + timeTaken + "ms");
>   } catch (ClosedChannelException e) {
> // ignore.
>   } catch (IOException ioe) {
> FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>  

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212194#comment-15212194
 ] 

Hadoop QA commented on HDFS-9908:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
58s {color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 13s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
2s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
46s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 30s 
{color} | {color:red} root: patch generated 1 new + 180 unchanged - 0 fixed = 
181 total (was 180) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 39s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 31s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 16s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 38s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 286m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210976#comment-15210976
 ] 

Hadoop QA commented on HDFS-9908:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
5s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
45s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 8s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s 
{color} | {color:red} root: patch generated 10 new + 180 unchanged - 0 fixed = 
190 total (was 180) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 6s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 40s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 47s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 18s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 40s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 49s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 266m 56s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207329#comment-15207329
 ] 

Hadoop QA commented on HDFS-9908:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 0s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 40s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s 
{color} | {color:red} root: patch generated 1 new + 214 unchanged - 0 fixed = 
215 total (was 214) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s 
{color} | {color:red} hadoop-common-project/hadoop-common generated 13 new + 0 
unchanged - 0 fixed = 13 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 50s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 57s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 56m 27s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 24s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 55m 24s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 199m 49s {color} 
| 

[jira] [Commented] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake

2016-03-22 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207106#comment-15207106
 ] 

Lei (Eddy) Xu commented on HDFS-9908:
-

Hey, [~jojochuang] Thanks for working on this.

{code}
if (true) {
throw new IOException("blah");
}
{code}

It seems to only be your debug code?  Could you remove it from the patch.

{code}
if (!unhealthyDataDirs.isEmpty()) {
   throw new DU.DiskUsageException(unhealthyDataDirs);
}
{code}
I think that not all {{IOE}}s are DU related?  Throwing a 
{{DiskUsageException}} here might be confused.

About {{handleDiskUsageError()}}, what if there are {{IOE}} that are not from 
DU? Should it throw these exceptions?

{code}
 try {
1545// Remove all unhealthy volumes from DataNode.
1546removeVolumes(removalCandidates, false);
1547  } catch (IOException e) {
1548LOG.warn("Error occurred when removing unhealthy storage dirs: "
1549+ e.getMessage(), e);
1550  }
{code}

If an {{IOE}} is thrown on this volume, is the metadata of the blocks on this 
volume still in memory? If so, can you add some comments.

{code}
import org.apache.hadoop.fs.*;
{code}
Please do not use wild card here. You can modify your IDE's preferences to 
prevent it.

{code}
private static boolean simulateDiskError;
{code}
If possible, it'd be better to not use {{static}} member for tests. If there is 
anything happened before you reset the flag, other tests  will mistakenly see 
this flag as enabled.

> Datanode should tolerate disk scan failure during NN handshake
> --
>
> Key: HDFS-9908
> URL: https://issues.apache.org/jira/browse/HDFS-9908
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.5.0
> Environment: CDH5.3.3
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, 
> HDFS-9908.003.patch
>
>
> DN may treat a disk scan failure exception as an NN handshake exception, and 
> this can prevent a DN to join a cluster even if most of its disks are healthy.
> During NN handshake, DN initializes block pools. It will create a lock files 
> per disk, and then scan the volumes. However, if the scanning throws 
> exceptions due to disk failure, DN will think it's an exception because NN is 
> inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a 
> result, it will attempt to reconnect to NN again.
> However, at this point, DN has not deleted its lock files on the disks. If it 
> reconnects to NN again, it will think the same disks are already being used, 
> and then it will fail handshake again because all disks can not be used (due 
> to locking), and repeatedly. This will happen even if the DN has multiple 
> disks, and only one of them fails. The DN will not be able to connect to NN 
> despite just one failing disk. Note that it is possible to successfully 
> create a lock file on a disk, and then has error scanning the disk.
> We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, 
> and we still see the same bug in 3.0.0 trunk branch). The root cause is that 
> DN treats an internal error (single disk failure) as an external one (NN 
> handshake failure) and we should fix it.
> {code:title=DataNode.java}
> /**
>* One of the Block Pools has successfully connected to its NN.
>* This initializes the local storage for that block pool,
>* checks consistency of the NN's cluster ID, etc.
>* 
>* If this is the first block pool to register, this also initializes
>* the datanode-scoped storage.
>* 
>* @param bpos Block pool offer service
>* @throws IOException if the NN is inconsistent with the local storage.
>*/
>   void initBlockPool(BPOfferService bpos) throws IOException {
> NamespaceInfo nsInfo = bpos.getNamespaceInfo();
> if (nsInfo == null) {
>   throw new IOException("NamespaceInfo not found: Block pool " + bpos
>   + " should have retrieved namespace info before initBlockPool.");
> }
> 
> setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID());
> // Register the new block pool with the BP manager.
> blockPoolManager.addBlockPool(bpos);
> 
> // In the case that this is the first block pool to connect, initialize
> // the dataset, block scanners, etc.
> initStorage(nsInfo);
> // Exclude failed disks before initializing the block pools to avoid 
> startup
> // failures.
> checkDiskError();
> data.addBlockPool(nsInfo.getBlockPoolID(), conf);  <- this line 
> throws disk error exception
> blockScanner.enableBlockPoolId(bpos.getBlockPoolId());
> initDirectoryScanner(conf);
>   }
> {code}
> {{FsVolumeList#addBlockPool}} is the source of