Benoit Sigoure created HDFS-8960:
------------------------------------

             Summary: DFS client says "no more good datanodes being available 
to try" on a single drive failure
                 Key: HDFS-8960
                 URL: https://issues.apache.org/jira/browse/HDFS-8960
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.7.1
         Environment: openjdk version "1.8.0_45-internal"
OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
            Reporter: Benoit Sigoure


Since we upgraded to 2.7.1 we regularly see single-drive failures cause 
widespread problems at the HBase level (with the default 3x replication target).

Here's an example.  This HBase RegionServer is r12s16 (172.24.32.16) and is 
writing its WAL to [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110] 
as can be seen by the following occasional messages:

{code}
2015-08-23 06:28:40,272 INFO  [sync.3] wal.FSHLog: Slow sync cost: 123 ms, 
current pipeline: [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110]
{code}

A bit later, the second node in the pipeline above is going to experience an 
HDD failure.

{code}
2015-08-23 07:21:58,720 WARN  [DataStreamer for file 
/hbase/WALs/r12s16.sjc.aristanetworks.com,9104,1439917659071/r12s16.sjc.aristanetworks.com%2C9104%2C1439917659071.default.1440314434998
 block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099] 
hdfs.DFSClient: Error Recovery for block 
BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099 in pipeline 
172.24.32.16:10110, 172.24.32.13:10110, 172.24.32.8:10110: bad datanode 
172.24.32.8:10110
{code}

And then HBase will go like "omg I can't write to my WAL, let me commit 
suicide".

{code}
2015-08-23 07:22:26,060 FATAL 
[regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1] 
wal.FSHLog: Could not append. Requesting close of wal
java.io.IOException: Failed to replace a bad datanode on the existing pipeline 
due to no more good datanodes being available to try. (Nodes: 
current=[172.24.32.16:10110, 172.24.32.13:10110], original=[172.24.32.16:10110, 
172.24.32.13:10110]). The current failed datanode replacement policy is 
DEFAULT, and a client may configure this via 
'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
configuration.
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933)
        at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
{code}

Whereas this should be mostly a non-event as the DFS client should just drop 
the bad replica from the write pipeline.

This is a small cluster but has 16 DNs so the failed DN in the pipeline should 
be easily replaced.  I didn't set 
{{dfs.client.block.write.replace-datanode-on-failure.policy}} (so it's still 
{{DEFAULT}}) and didn't set 
{{dfs.client.block.write.replace-datanode-on-failure.enable}} (so it's still 
{{true}}).

I don't see anything noteworthy in the NN log around the time of the failure, 
it just seems like the DFS client gave up or threw an exception back to HBase 
that it wasn't throwing before or something else, and that made this single 
drive failure lethal.

We've occasionally be "unlucky" enough to have a single-drive failure cause 
multiple RegionServers to commit suicide because they had their WALs on that 
drive.

We upgraded from 2.7.0 about a month ago, and I'm not sure whether we were 
seeing this with 2.7 or not – prior to that we were running in a quite 
different environment, but this is a fairly new deployment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to