[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-29 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355329#comment-15355329
 ] 

Wei-Chiu Chuang commented on HDFS-10512:


Yes that is correct. To add more color, here's what happened before the NPE:

A datanode's replica had corruption. It transferred the block to a destination 
for block recovery. The destination verified checksum of block and found it 
corrupt, and terminate the socket. The source datanode got a socket reset 
exception, and invokes the following code snippet:
{code|title=BlockSender#sendPacket}
datanode.getBlockScanner().markSuspectBlock(
  volumeRef.getVolume().getStorageID(),
  block);
{code}
VolumeScanner has an asynchronous thread which checks the integrity of the 
block. Which it found the block is bad, it invokes DataNode.reportBadBlocks().

How about we create another version of DataNode.reportBadBlocks() and which 
takes two parameters: volume and block and let VolumeScanner invoke this new 
overloaded method?

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-28 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354213#comment-15354213
 ] 

Yiqun Lin commented on HDFS-10512:
--

>From the code in {{FsDatasetImpl}}, I see that the method 
>{{FsDatasetImpl#getVolume}} returns null cause the NPE. In these code:
{code}
  @Override
  public synchronized FsVolumeImpl getVolume(final ExtendedBlock b) {
final ReplicaInfo r =  volumeMap.get(b.getBlockPoolId(), b.getLocalBlock());
return r != null? (FsVolumeImpl)r.getVolume(): null;
  }
{code}
So it means that the Replicainfo of corrupt block in volumeMap has been 
removed. And there are many cases will trigger the operation 
{{volumeMap.remove}} in {{FsDatasetImpl}}. So I want to say, the case that 
mentioned in HDFS-10587 will lead this, can you confirm this, [~jojochuang]?


> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-28 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353913#comment-15353913
 ] 

Wei-Chiu Chuang commented on HDFS-10512:


HDFS-10587 was filed for the root cause of the bad block that I observed.

In addition, HDFS-6937 will be obsolete if we can get a good fix for this bug.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-28 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353315#comment-15353315
 ] 

Wei-Chiu Chuang commented on HDFS-10512:


Hi [~linyiqun] much appreciate your patch. The patch itself looks good to me.

However, I have been hesitate to give my non-binding +1, because when this 
method is being called, a block is corrupt. After this patch, VolumeScanner 
will not terminate prematurely, which is good, but it still won't tell NameNode 
to mark this replica corrupt. And that's still a really bad thing to have.

Any comments? [~xyao] or other watchers?
Do you think this patch should go in despite we do not know the root cause of 
NPE?

This is a really bad bug, which causes pipeline to abort, because it will never 
transmit correct block to the downstream pipeline, and pipeline can not 
construct three good replicas.

BTW, I have found the root cause of corrupt replica, and I'll file another jira 
today, but I still think it would be nice to know what causes NPE.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-14 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330989#comment-15330989
 ] 

Yiqun Lin commented on HDFS-10512:
--

The failed test {{TestPendingInvalidateBlock }} was tracked by HDFS-10426, 
other failed unit tests are not related, thanks for review.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325681#comment-15325681
 ] 

Hadoop QA commented on HDFS-10512:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 20 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 48s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 104m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809604/HDFS-10512.002.patch |
| JIRA Issue | HDFS-10512 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux d19930ffbfb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8a1dcce |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15742/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15742/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HDFS-Build/15742/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15742/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325659#comment-15325659
 ] 

Hadoop QA commented on HDFS-10512:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 20 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 0s 
{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 6s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809594/HDFS-10512.002.patch |
| JIRA Issue | HDFS-10512 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4d77ed9c5525 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8a1dcce |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15741/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15741/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15741/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu 

[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325573#comment-15325573
 ] 

Yiqun Lin commented on HDFS-10512:
--

Thanks [~jojochuang] and [~xyao] for review. Post the new patch for addressing 
the comments.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325503#comment-15325503
 ] 

Xiaoyu Yao commented on HDFS-10512:
---

Thanks [~jojochuang] for reporting the issue and [~linyiqun] for posting the 
patch. 
There is a similar usage {{DataNode#reportBadBlock}} that needs to check null 
volume as well. 
For both cases, I would suggest we LOG an ERROR like follows.

{code}
if (volume != null) {
  bpos.reportBadBlocks(
  block, volume.getStorageID(), volume.getStorageType());
} else {
  LOG.error("Cannot find FsVolumeSpi to report bad block id:"
  + blockBlockId()  + " bpid: " + block.getBlockPoolId());
}
{code}

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-10 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324552#comment-15324552
 ] 

Wei-Chiu Chuang commented on HDFS-10512:


Thanks [~linyiqun] The check in the patch looks good to me.
I think that if the volume is null for some reason and can't report the bad 
block to the NN, it should throw an IOException so that this not ignored. At 
this point, I am not sure if it's some race condition in a bug somewhere.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Yiqun Lin
> Attachments: HDFS-10512.001.patch
>
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323829#comment-15323829
 ] 

Hadoop QA commented on HDFS-10512:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 20 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 25s {color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 96m 10s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:2c91fd8 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12809345/HDFS-10512.001.patch |
| JIRA Issue | HDFS-10512 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 40a5f2c7d65a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9581fb7 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15729/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15729/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit test logs |  
https://builds.apache.org/job/PreCommit-HDFS-Build/15729/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15729/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15729/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.




[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-09 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323755#comment-15323755
 ] 

Yiqun Lin commented on HDFS-10512:
--

The block has been checked before {{datanode.reportBadBlocks}} in 
{{ScanResultHandler#handle}}. So I suspect that the block was removed after the 
logic of checking and before {{datanode.reportBadBlocks}}. Otherwise, if the 
block was already not exist in datanode, it will just return.
{code}
public void handle(ExtendedBlock block, IOException e) {
  FsVolumeSpi volume = scanner.volume;
  if (e == null) {
LOG.trace("Successfully scanned {} on {}", block, volume.getBasePath());
return;
  }
  // If the block does not exist anymore, then it's not an error.
  if (!volume.getDataset().contains(block)) {
LOG.debug("Volume {}: block {} is no longer in the dataset.",
volume.getBasePath(), block);
return;
  }
  ...
  LOG.warn("Reporting bad {} on {}", block, volume.getBasePath());
  try {
scanner.datanode.reportBadBlocks(block);
  } catch (IOException ie) {
// This is bad, but not bad enough to shut down the scanner.
LOG.warn("Cannot report bad " + block.getBlockId(), e);
  }
}
{code}

Attach a patch for this.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks

2016-06-09 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15322619#comment-15322619
 ] 

Wei-Chiu Chuang commented on HDFS-10512:


This is different from HDFS-8850/HDFS-9190.

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> --
>
> Key: HDFS-10512
> URL: https://issues.apache.org/jira/browse/HDFS-10512
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>
> VolumeScanner may terminate due to unexpected NullPointerException thrown in 
> {{DataNode.reportBadBlocks()}}. This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still 
> persist in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-1800173197-10.204.68.5-125156296:blk_1170134484_96468685 on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
> at 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn, 
> DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. 
> Somehow the volume scanner know the volume, but the datanode can not lookup 
> the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
> BPOfferService bpos = getBPOSForBlock(block);
> FsVolumeSpi volume = getFSDataset().getVolume(block);
> bpos.reportBadBlocks(
> block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org