Manoj Govindassamy created HDFS-10780:
-----------------------------------------

             Summary: Block replication not happening on removing a volume when 
data being written to a datanode -- TestDataNodeHotSwapVolumes fails
                 Key: HDFS-10780
                 URL: https://issues.apache.org/jira/browse/HDFS-10780
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


TestDataNodeHotSwapVolumes occasionally fails in the unit test 
testRemoveVolumeBeingWrittenForDatanode.  Data write pipeline can have issues 
as there could be timeouts, data node not reachable etc, and in this test case 
it was more of induced one as one of the volumes in a datanode is removed while 
block write is in progress. Digging further in the logs, when the problem 
happens in the write pipeline, the error recovery is not happening as expected 
leading to block replication never catching up.

Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.serv
testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes)
  Time elapsed: 44.354 se
java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
Results :
Tests in error: 
  
TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714
 ยป Timeout
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

Following exceptions are not expected in this test run
{noformat}
 614 2016-08-10 12:30:11,269 [DataXceiver for client 
DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block 
BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG datanode.Da 
    taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active 
connections is: 2
 615 java.lang.IllegalMonitorStateException
 616         at java.lang.Object.wait(Native Method)
 617         at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
 618         at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
 619         at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
 620         at 
org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
{noformat}

{noformat}
 720 2016-08-10 12:30:11,287 [DataNode: 
[[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec     
t/hadoop-hdfs/target/test/data/dfs/data/data2/]]  heartbeating to 
localhost/127.0.0.1:58788] ERROR datanode.DataNode 
(BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool 
BP-18529     88604-172.16.3.66-1470857409044 (Datanode Uuid 
711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
 721 java.lang.NullPointerException
 722         at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
 723         at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
 724         at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
 725         at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
 726         at java.lang.Thread.run(Thread.java:745)
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to