[jira] [Updated] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-9833: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha1 Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~rakeshr] for the great contribution! > Erasure coding: recomputing block checksum on the fly by reconstructing the > missed/corrupt block data > - > > Key: HDFS-9833 > URL: https://issues.apache.org/jira/browse/HDFS-9833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch, > HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch, > HDFS-9833-05.patch, HDFS-9833-06.patch, HDFS-9833-07.patch, HDFS-9833-08.patch > > > As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum > even some of striped blocks are missed, we need to consider recomputing block > checksum on the fly for the missed/corrupt blocks. To recompute the block > checksum, the block data needs to be reconstructed by erasure decoding, and > the main needed codes for the block reconstruction could be borrowed from > HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC > worker, reconstructed blocks need to be written out to target datanodes, but > here in this case, the remote writing isn't necessary, as the reconstructed > block data is only used to recompute the checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309201#comment-15309201 ] Hadoop QA commented on HDFS-10471: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 226 unchanged - 2 fixed = 228 total (was 228) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 14s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 0s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807304/HDFS-10471.002.patch | | JIRA Issue | HDFS-10471 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 08db182fa162 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8ceb06e | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/15618/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15618/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HDFS-Build/15618/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15618/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Updated] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10471: - Attachment: HDFS-10471.002.patch Sorry, the failed test {{TestHDFSCLI }} is related. Attach a new patch to fix this. > DFSAdmin#SetQuotaCommand's help msg is not correct > -- > > Key: HDFS-10471 > URL: https://issues.apache.org/jira/browse/HDFS-10471 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10471.001.patch, HDFS-10471.002.patch > > > The help message of the command that related with SetQuota is not show > correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was > not appeared before. > {noformat} > -setQuota ...: Set the quota for each > directory . > The directory quota is a long integer that puts a hard limit > on the number of names in the directory tree > For each directory, attempt to set the quota. An error will be > reported if > 1. N is not a positive integer, or > 2. User is not an administrator, or > 3. The directory does not exist or is a file. > Note: A quota of 1 would force the directory to remain empty. > {noformat} > The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309068#comment-15309068 ] Yiqun Lin edited comment on HDFS-10471 at 6/1/16 1:59 AM: -- Sorry, the failed test {{TestHDFSCLI}} is related. Attach a new patch to fix this. was (Author: linyiqun): Sorry, the failed test {{TestHDFSCLI }} is related. Attach a new patch to fix this. > DFSAdmin#SetQuotaCommand's help msg is not correct > -- > > Key: HDFS-10471 > URL: https://issues.apache.org/jira/browse/HDFS-10471 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10471.001.patch, HDFS-10471.002.patch > > > The help message of the command that related with SetQuota is not show > correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was > not appeared before. > {noformat} > -setQuota ...: Set the quota for each > directory . > The directory quota is a long integer that puts a hard limit > on the number of names in the directory tree > For each directory, attempt to set the quota. An error will be > reported if > 1. N is not a positive integer, or > 2. User is not an administrator, or > 3. The directory does not exist or is a file. > Note: A quota of 1 would force the directory to remain empty. > {noformat} > The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309065#comment-15309065 ] Rakesh R commented on HDFS-9833: Thanks a lot [~drankye] for the good support! > Erasure coding: recomputing block checksum on the fly by reconstructing the > missed/corrupt block data > - > > Key: HDFS-9833 > URL: https://issues.apache.org/jira/browse/HDFS-9833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch, > HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch, > HDFS-9833-05.patch, HDFS-9833-06.patch, HDFS-9833-07.patch, HDFS-9833-08.patch > > > As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum > even some of striped blocks are missed, we need to consider recomputing block > checksum on the fly for the missed/corrupt blocks. To recompute the block > checksum, the block data needs to be reconstructed by erasure decoding, and > the main needed codes for the block reconstruction could be borrowed from > HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC > worker, reconstructed blocks need to be written out to target datanodes, but > here in this case, the remote writing isn't necessary, as the reconstructed > block data is only used to recompute the checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Attachment: datanode_html.001.jpg > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Block pools info (BP IDs, namenode address, actor states) > * Storage info (Volumes, capacity used, reserved, left) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Attachment: (was: datanode_html.001.jpg) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_utilities.001.jpg, > dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Block pools info (BP IDs, namenode address, actor states) > * Storage info (Volumes, capacity used, reserved, left) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Status: In Progress (was: Patch Available) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0, 2.6.0, 2.5.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Block pools info (BP IDs, namenode address, actor states) > * Storage info (Volumes, capacity used, reserved, left) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Description: At present, datanode web UI doesn't have much information except for node name and port. Propose to add more information similar to namenode UI, including, * Static info (version, block pool and cluster ID) * Block pools info (BP IDs, namenode address, actor states) * Storage info (Volumes, capacity used, reserved, left) * Utilities (logs) was: At present, datanode web UI doesn't have much information except for node name and port. Propose to add more information similar to namenode UI, including, * Static info (version, block pool and cluster ID) * Running state (active, decommissioning, decommissioned or lost etc) * Summary (blocks, capacity, storage etc) * Utilities (logs) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Block pools info (BP IDs, namenode address, actor states) > * Storage info (Volumes, capacity used, reserved, left) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309023#comment-15309023 ] Mingliang Liu commented on HDFS-10415: -- Thank you [~cmccabe] for your review and commit! > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309005#comment-15309005 ] Hudson commented on HDFS-9466: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9891 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9891/]) HDFS-9466. TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure (cmccabe: rev c7921c9bddb79c9db5059b6c3f7a3a586a3cd95b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order
[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308997#comment-15308997 ] Colin Patrick McCabe commented on HDFS-10301: - bq. Vinitha's patch adds one RPC only in the case when block reports are sent in multiple RPCs. The case where block reports are sent in multiple RPCs is exactly the case where scalability is the most important, since it indicates that we have a large number of blocks. My patch adds no new RPCs. If we are going to take an alternate approach, it should not involve a performance regression. bq. Could you please review the patch. I did review the patch. I suggested adding an optional field in an existing RPC rather than adding a new RPC, and stated that I was -1 on adding new RPC load to the NN. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.1 >Reporter: Konstantin Shvachko >Assignee: Colin Patrick McCabe >Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.01.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10458) getFileEncryptionInfo should return quickly for non-encrypted cluster
[ https://issues.apache.org/jira/browse/HDFS-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308994#comment-15308994 ] Konstantin Shvachko commented on HDFS-10458: The patch looks good. I would add a similar condition for empty {{encryptionZones}} at the start of {{EncryptionZoneManager.getEncryptionZoneForPath()}}. This method is used many times both for write operations, like {{startFile()}} and for reads, like {{getFileInfo()}}. Even though this will still be under the lock, but returning and releasing the lock quickly should be beneficial. > getFileEncryptionInfo should return quickly for non-encrypted cluster > - > > Key: HDFS-10458 > URL: https://issues.apache.org/jira/browse/HDFS-10458 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, namenode >Affects Versions: 2.6.0 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-10458.00.patch > > > {{FSDirectory#getFileEncryptionInfo}} always acquires {{readLock}} and checks > if the path belongs to an EZ. For a busy system with potentially many listing > operations, this could cause locking contention. > I think we should add a call {{EncryptionZoneManager#hasEncryptionZone()}} to > return whether the system has any EZ. If no EZ at all, > {{getFileEncryptionInfo}} should return null without {{readLock}}. > If {{hasEncryptionZone}} is only used in the above scenario, maybe itself > doesn't need a {{readLock}} -- if the system doesn't have any EZ when > {{getFileEncryptionInfo}} is called on a path, it means the path cannot be > encrypted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980 ] Colin Patrick McCabe edited comment on HDFS-9466 at 6/1/16 12:52 AM: - Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMapsegments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface might not find it useful. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} It seems like we could lower the latency here (perhaps check every 10 ms) and lengthen the timeout. Since the test timeouts are generally 60s, I don't think it makes sense to make this timeout shorter than that. +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. was (Author: cmccabe): Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMap segments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface would have no use for it. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} No reason to make this shorter than the test limit, surely? +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980 ] Colin Patrick McCabe commented on HDFS-9466: Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMapsegments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface would have no use for it. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} No reason to make this shorter than the test limit, surely? +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308964#comment-15308964 ] Hudson commented on HDFS-10415: --- SUCCESS: Integrated in Hadoop-trunk-Commit #9890 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9890/]) HDFS-10415. TestDistributedFileSystem#MyDistributedFileSystem attempts (cmccabe: rev 29d6cadc52e411990c8237fd2fa71257cea60d9a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10415: Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to 2.8. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Fix For: 2.8.0 > > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928 ] Colin Patrick McCabe commented on HDFS-10415: - The subclass can change the configuration that gets passed to the superclass. class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308928#comment-15308928 ] Colin Patrick McCabe edited comment on HDFS-10415 at 6/1/16 12:09 AM: -- The subclass can change the configuration that gets passed to the superclass. {code} class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } {code} Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. was (Author: cmccabe): The subclass can change the configuration that gets passed to the superclass. class SuperClass { SuperClass(Configuration conf) { ... initialize superclass part of the object ... } } class SubClass extends SuperClass { SubClass(Configuration conf) { super(changeConf(conf)); ... initialize my part of the object ... } private static Configuration changeConf(Configuration conf) { Configuration nconf = new Configuration(conf); nconf.set("foo", "bar"); return nconf; } } Having a separate init() method is a well-known antipattern. Initialization belongs in the constructor. The only time a separate init method is really necessary is if you're using a dialect of C++ that doesn't support exceptions. > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10342) BlockManager#createLocatedBlocks should not check corrupt replicas if none are corrupt
[ https://issues.apache.org/jira/browse/HDFS-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308913#comment-15308913 ] Kuhu Shukla commented on HDFS-10342: Thank you [~xiaobingo] . Sorry about the delay, I have been occupied with some non HDFS work lately. I will work on it later this week. Hope that works! Let me know if you have any comments on this. Thanks! > BlockManager#createLocatedBlocks should not check corrupt replicas if none > are corrupt > -- > > Key: HDFS-10342 > URL: https://issues.apache.org/jira/browse/HDFS-10342 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > > {{corruptReplicas#isReplicaCorrupt(block, node)}} is called for every node > while populating the machines array. There's no need to invoke the method if > {{corruptReplicas#numCorruptReplicas(block)}} returned 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10415) TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10415: Summary: TestDistributedFileSystem#MyDistributedFileSystem attempts to set up statistics before initialize() is called (was: TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2) > TestDistributedFileSystem#MyDistributedFileSystem attempts to set up > statistics before initialize() is called > - > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10342) BlockManager#createLocatedBlocks should not check corrupt replicas if none are corrupt
[ https://issues.apache.org/jira/browse/HDFS-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308898#comment-15308898 ] Xiaobing Zhou commented on HDFS-10342: -- [~kshukla] would you like to post a patch for it? Thanks. > BlockManager#createLocatedBlocks should not check corrupt replicas if none > are corrupt > -- > > Key: HDFS-10342 > URL: https://issues.apache.org/jira/browse/HDFS-10342 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > > {{corruptReplicas#isReplicaCorrupt(block, node)}} is called for every node > while populating the machines array. There's no need to invoke the method if > {{corruptReplicas#numCorruptReplicas(block)}} returned 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10415) TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2
[ https://issues.apache.org/jira/browse/HDFS-10415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308894#comment-15308894 ] Colin Patrick McCabe commented on HDFS-10415: - It sounds like there are no strong objections to HDFS-10415.000.patch and HDFS-10415-branch-2.001.patch Let's fix this unit test! We can improve this in a follow-on JIRA (personally, I like the idea of adding the initialization to the {{init}} method). But it's not worth blocking the unit test fix. +1. > TestDistributedFileSystem#testDFSCloseOrdering() fails on branch-2 > -- > > Key: HDFS-10415 > URL: https://issues.apache.org/jira/browse/HDFS-10415 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 > Environment: jenkins >Reporter: Sangjin Lee >Assignee: Mingliang Liu > Attachments: HDFS-10415-branch-2.000.patch, > HDFS-10415-branch-2.001.patch, HDFS-10415.000.patch > > > {noformat} > Tests run: 24, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 51.096 sec > <<< FAILURE! - in org.apache.hadoop.hdfs.TestDistributedFileSystem > testDFSCloseOrdering(org.apache.hadoop.hdfs.TestDistributedFileSystem) Time > elapsed: 0.045 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:790) > at > org.apache.hadoop.fs.FileSystem.processDeleteOnExit(FileSystem.java:1417) > at org.apache.hadoop.fs.FileSystem.close(FileSystem.java:2084) > at > org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:1187) > at > org.apache.hadoop.hdfs.TestDistributedFileSystem.testDFSCloseOrdering(TestDistributedFileSystem.java:217) > {noformat} > This is with Java 8 on Mac. It passes fine on trunk. I haven't tried other > combinations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9476) TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail
[ https://issues.apache.org/jira/browse/HDFS-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308893#comment-15308893 ] Xiaobing Zhou commented on HDFS-9476: - The new patch looks good, +1. > TestDFSUpgradeFromImage#testUpgradeFromRel1BBWImage occasionally fail > - > > Key: HDFS-9476 > URL: https://issues.apache.org/jira/browse/HDFS-9476 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Akira AJISAKA > Attachments: HDFS-9476.002.patch, HDFS-9476.01.patch > > > This test occasionally fail. For example, the most recent one is: > https://builds.apache.org/job/Hadoop-Hdfs-trunk/2587/ > Error Message > {noformat} > Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > {noformat} > Stacktrace > {noformat} > java.io.IOException: Cannot obtain block length for > LocatedBlock{BP-1371507683-67.195.81.153-1448798439809:blk_7162739548153522810_1020; > getBlockSize()=1024; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:33080,DS-c5eaf2b4-2ee6-419d-a8a0-44a5df5ef9a1,DISK]]} > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:399) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:343) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:275) > at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:265) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1046) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1011) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.dfsOpenFileWithRetries(TestDFSUpgradeFromImage.java:177) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyDir(TestDFSUpgradeFromImage.java:213) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.verifyFileSystem(TestDFSUpgradeFromImage.java:228) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:600) > at > org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel1BBWImage(TestDFSUpgradeFromImage.java:622) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10465) libhdfs++: Implement GetBlockLocations
[ https://issues.apache.org/jira/browse/HDFS-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308891#comment-15308891 ] Hadoop QA commented on HDFS-10465: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 37s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 58s {color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 2s {color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 22s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 20s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 23s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 16s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 33s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807277/HDFS-10465.HDFS-8707.000.patch | | JIRA Issue | HDFS-10465 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux c55c1642173c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / f0ef898 | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15617/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15617/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > libhdfs++: Implement GetBlockLocations > -- > > Key: HDFS-10465 > URL: https://issues.apache.org/jira/browse/HDFS-10465 > Project: Hadoop HDFS > Issue Type:
[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-10433: --- Issue Type: New Feature (was: Sub-task) Parent: (was: HDFS-9924) > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308833#comment-15308833 ] Tsz Wo Nicholas Sze commented on HDFS-10433: Thanks Jing. I will commit this shortly and file a follow up JIRA. > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks
[ https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308830#comment-15308830 ] Xiaobing Zhou commented on HDFS-10341: -- [~ajisakaa] thank you for the work. Would you like to post a new patch to address [~arpitagarwal]'s comments? > Add a metric to expose the timeout number of pending replication blocks > --- > > Key: HDFS-10341 > URL: https://issues.apache.org/jira/browse/HDFS-10341 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: HDFS-10341.01.patch, HDFS-10341.02.patch, > HDFS-10341.03.patch > > > Per HDFS-6682, recording the timeout number of pending replication blocks is > useful to get the cluster health. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10465) libhdfs++: Implement GetBlockLocations
[ https://issues.apache.org/jira/browse/HDFS-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-10465: -- Assignee: Bob Hansen Status: Patch Available (was: Open) > libhdfs++: Implement GetBlockLocations > -- > > Key: HDFS-10465 > URL: https://issues.apache.org/jira/browse/HDFS-10465 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > Attachments: HDFS-10465.HDFS-8707.000.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10465) libhdfs++: Implement GetBlockLocations
[ https://issues.apache.org/jira/browse/HDFS-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-10465: -- Attachment: HDFS-10465.HDFS-8707.000.patch Introduces new function in hdfs_ext: hdfsGetBlockLocations > libhdfs++: Implement GetBlockLocations > -- > > Key: HDFS-10465 > URL: https://issues.apache.org/jira/browse/HDFS-10465 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen > Attachments: HDFS-10465.HDFS-8707.000.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10464) libhdfs++: Implement GetPathInfo and ListDirectory
[ https://issues.apache.org/jira/browse/HDFS-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-10464: -- Assignee: Bob Hansen > libhdfs++: Implement GetPathInfo and ListDirectory > -- > > Key: HDFS-10464 > URL: https://issues.apache.org/jira/browse/HDFS-10464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Bob Hansen > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10464) libhdfs++: Implement GetPathInfo and ListDirectory
[ https://issues.apache.org/jira/browse/HDFS-10464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308786#comment-15308786 ] Anatoli Shein commented on HDFS-10464: -- I can fix this! > libhdfs++: Implement GetPathInfo and ListDirectory > -- > > Key: HDFS-10464 > URL: https://issues.apache.org/jira/browse/HDFS-10464 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-10433: - Hadoop Flags: Reviewed > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10433) Make retry also works well for Async DFS
[ https://issues.apache.org/jira/browse/HDFS-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308630#comment-15308630 ] Jing Zhao commented on HDFS-10433: -- Thanks for updating the patch, [~szetszwo]! The latest patch looks good to me overall. Two questions: # The interval between two retries is realized by {{Thread.sleep}}, which makes the background thread in {{AsyncCallHandler}} sleep. Because all the client side {{Future.get}} calls need to wait until the background thread for the final result, this sleep may delay all the pending requests. # The current background thread does a sleep inside of the loop, which may delay all the RPC requests. Ideally we want this thread to wait for response notification from RPC client. # Minor: {{Counters}} can be created inside of the Call's constructor method instead of being passed as a parameter. Looks like #1 and #2 need some extra work. Considering the current patch is already complicated, we can address them in a separate jira. I will give +1 for committing the current patch first. > Make retry also works well for Async DFS > > > Key: HDFS-10433 > URL: https://issues.apache.org/jira/browse/HDFS-10433 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Xiaobing Zhou >Assignee: Tsz Wo Nicholas Sze > Attachments: h10433_20160524.patch, h10433_20160525.patch, > h10433_20160525b.patch, h10433_20160527.patch, h10433_20160528.patch, > h10433_20160528c.patch > > > In current Async DFS implementation, file system calls are invoked and > returns Future immediately to clients. Clients call Future#get to retrieve > final results. Future#get internally invokes a chain of callbacks residing in > ClientNamenodeProtocolTranslatorPB, ProtobufRpcEngine and ipc.Client. The > callback path bypasses the original retry layer/logic designed for > synchronous DFS. This proposes refactoring to make retry also works for Async > DFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10466) DistributedFileSystem.listLocatedStatus() should return HdfsBlockLocation instead of BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Yu updated HDFS-10466: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) Thanks [~andrew.wang] for discussion. I just need a unique ID for DN storage. With https://issues.apache.org/jira/browse/HDFS-8887, BlockLocation already contains those information. no need to add LocatedBlock. Close it. > DistributedFileSystem.listLocatedStatus() should return HdfsBlockLocation > instead of BlockLocation > -- > > Key: HDFS-10466 > URL: https://issues.apache.org/jira/browse/HDFS-10466 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Juan Yu >Assignee: Juan Yu >Priority: Minor > Attachments: HDFS-10466.001.patch, HDFS-10466.patch > > > https://issues.apache.org/jira/browse/HDFS-202 added a new API > listLocatedStatus() to get all files' status with block locations for a > directory. This is great that we don't need to call > FileSystem.getFileBlockLocations() for each file. it's much faster (about > 8-10 times). > However, the returned LocatedFileStatus only contains basic BlockLocation > instead of HdfsBlockLocation, the LocatedBlock details are stripped out. > It should do the similar as DFSClient.getBlockLocations(), return > HdfsBlockLocation which provide full block location details. > The implementation of DistributedFileSystem. listLocatedStatus() retrieves > HdfsLocatedFileStatus which contains all information, but when convert it to > LocatedFileStatus, it doesn't keep LocatedBlock data. It's a simple (and > compatible) change to make to keep the LocatedBlock details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10341) Add a metric to expose the timeout number of pending replication blocks
[ https://issues.apache.org/jira/browse/HDFS-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308268#comment-15308268 ] Arpit Agarwal commented on HDFS-10341: -- Hi [~ajisakaa], that makes sense. So IIUC the metric may count the same block multiple times as timed out blocks are reinserted into the needed replications queue. If so perhaps we should rename the metric to {{NumTimedOutPendingReconstructions}} and update the documentation to state that it counts the number of timed out reconstructions and not the number of unique blocks that timed out? +1 with those updates. > Add a metric to expose the timeout number of pending replication blocks > --- > > Key: HDFS-10341 > URL: https://issues.apache.org/jira/browse/HDFS-10341 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Attachments: HDFS-10341.01.patch, HDFS-10341.02.patch, > HDFS-10341.03.patch > > > Per HDFS-6682, recording the timeout number of pending replication blocks is > useful to get the cluster health. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10211) Add more info to DelegationTokenIdentifier#toString for better supportability
[ https://issues.apache.org/jira/browse/HDFS-10211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10211: - Description: Base class {{AbstractDelegationTokenIdentifier}} has the following implementation of {{toString()}} method {code} @Override public String toString() { StringBuilder buffer = new StringBuilder(); buffer .append("owner=" + owner + ", renewer=" + renewer + ", realUser=" + realUser + ", issueDate=" + issueDate + ", maxDate=" + maxDate + ", sequenceNumber=" + sequenceNumber + ", masterKeyId=" + masterKeyId); return buffer.toString(); } {code} However, derived class {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}} has the following implementation that overrides the base class above: {code} @Override public String toString() { return getKind() + " token " + getSequenceNumber() + " for " + getUser().getShortUserName(); } {code} And when exception is thrown because of token expiration or other reason (in {{AbstractDelegationTokenSecretManager#checkToken}}): {code} if (info.getRenewDate() < Time.now()) { throw new InvalidToken("token (" + identifier.toString() + ") is expired"); } {code} The exception doesn't show the detailed information about the token, like the base class' toString() method returns. Creating this jira to change the {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}} implementation to include all the info about the token, as included by the base class. This change would help supportability, at the expense of printing a little more information to the log. I hope no code really depends on the output string. was: Base class {{AbstractDelegationTokenIdentifier}} has the following implementation of {{toString()}} method {code} @Override public String toString() { StringBuilder buffer = new StringBuilder(); buffer .append("owner=" + owner + ", renewer=" + renewer + ", realUser=" + realUser + ", issueDate=" + issueDate + ", maxDate=" + maxDate + ", sequenceNumber=" + sequenceNumber + ", masterKeyId=" + masterKeyId); return buffer.toString(); } {code} However, derived class {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}} has the following implementation that overrides the base class above: {code} @Override public String toString() { return getKind() + " token " + getSequenceNumber() + " for " + getUser().getShortUserName(); } {code} And when exception is thrown because of token expiration or other reason: {code} if (info.getRenewDate() < Time.now()) { throw new InvalidToken("token (" + identifier.toString() + ") is expired"); } {code} The exception doesn't show the detailed information about the token, like the base class' toString() method returns. Creating this jira to change the {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}} implementation to include all the info about the token, as included by the base class. This change would help supportability, at the expense of printing a little more information to the log. I hope no code really depends on the output string. > Add more info to DelegationTokenIdentifier#toString for better supportability > - > > Key: HDFS-10211 > URL: https://issues.apache.org/jira/browse/HDFS-10211 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Base class {{AbstractDelegationTokenIdentifier}} has the following > implementation of {{toString()}} method > {code} > @Override > public String toString() { > StringBuilder buffer = new StringBuilder(); > buffer > .append("owner=" + owner + ", renewer=" + renewer + ", realUser=" > + realUser + ", issueDate=" + issueDate + ", maxDate=" + maxDate > + ", sequenceNumber=" + sequenceNumber + ", masterKeyId=" > + masterKeyId); > return buffer.toString(); > } > {code} > However, derived class > {{org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier}} > has the following implementation that overrides the base class above: > {code} > @Override > public String toString() { > return getKind() + " token " + getSequenceNumber() > + " for " + getUser().getShortUserName(); > } > {code} > And when exception is thrown because of token expiration or other reason (in > {{AbstractDelegationTokenSecretManager#checkToken}}): > {code} > if (info.getRenewDate() < Time.now()) { > throw new InvalidToken("token (" + identifier.toString() + ") is > expired"); > } > {code} > The exception
[jira] [Updated] (HDFS-10441) libhdfs++: HA namenode support
[ https://issues.apache.org/jira/browse/HDFS-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-10441: --- Attachment: HDFS-8707.HDFS-10441.001.patch Rebased patch on top of SASL work. I've been a little busier than expected so I haven't had a chance to address all of Bob's comments yet. Really posted in case [~bobhansen] wants to check out the merge; otherwise the rest isn't worth looking at too closely yet. > libhdfs++: HA namenode support > -- > > Key: HDFS-10441 > URL: https://issues.apache.org/jira/browse/HDFS-10441 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer > Attachments: HDFS-10441.HDFS-8707.000.patch, > HDFS-8707.HDFS-10441.001.patch > > > If a cluster is HA enabled then do proper failover. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10370) Allow DataNode to be started with numactl
[ https://issues.apache.org/jira/browse/HDFS-10370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308009#comment-15308009 ] Dave Marion commented on HDFS-10370: bq. Secure mode daemons do not have the necessary code here I made a small mention in the description about this; I'm not sure how it would work with jsvc. Regarding some of the other points, I'm not familiar with the coding rules that are in place for the scripts. I don't believe I have the necessary karma to move this issue. > Allow DataNode to be started with numactl > - > > Key: HDFS-10370 > URL: https://issues.apache.org/jira/browse/HDFS-10370 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Dave Marion >Assignee: Dave Marion > Attachments: HDFS-10370-1.patch, HDFS-10370-2.patch, > HDFS-10370-3.patch, HDFS-10370-branch-2.004.patch, HDFS-10370.004.patch > > > Allow numactl constraints to be applied to the datanode process. The > implementation I have in mind involves two environment variables (enable and > parameters) in the datanode startup process. Basically, if enabled and > numactl exists on the system, then start the java process using it. Provide a > default set of parameters, and allow the user to override the default. Wiring > this up for the non-jsvc use case seems straightforward. Not sure how this > can be supported using jsvc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8872) Reporting of missing blocks is different in fsck and namenode ui/metasave
[ https://issues.apache.org/jira/browse/HDFS-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307937#comment-15307937 ] Rushabh S Shah commented on HDFS-8872: -- [~mingma]: any thoughts ? > Reporting of missing blocks is different in fsck and namenode ui/metasave > - > > Key: HDFS-8872 > URL: https://issues.apache.org/jira/browse/HDFS-8872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > Namenode ui and metasave will not report a block as missing if the only > replica is on decommissioning/decomissioned node while fsck will show it as > MISSING. > Since decommissioned node can be formatted/removed anytime, we can actually > lose the block. > Its better to alert on namenode ui if the only copy is on > decomissioned/decommissioning node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307905#comment-15307905 ] Kihwal Lee edited comment on HDFS-10440 at 5/31/16 3:27 PM: Thanks for the patch. I think it is showing the available information very well. Having said that, we can take this opportunity to expose more on the block pool via jmx. The namenode addresses are useful, but showing the service actor state will be even better. Sometimes datanodes have trouble talking to some namenodes, but not all. Verifying it usually involves looking at the log. Exposing individual BP service actor state through jmx and showing them through UI will be very helpful. For the storage section, {{VolumeInfo}} in trunk/2.9/2.8 already contains {{reservedSpaceForReplicas}} (HDFS-6955) and {{numBlocks}} (HDFS-9425). Please verify (screenshot?) they appear on the web ui. was (Author: kihwal): Thanks for the patch. I think it is showing the available information very well. Having said that, we can take this opportunity to expose more on the block pool via jmx. The namenode addressed are useful, but showing the service actor state will be even better. Sometimes datanodes have trouble talking to some namenode, but not all. Verifying it usually involves looking at the log. Exposing individual BP service actor state through jmx and showing them through UI will be very helpful. For the storage section, {{VolumeInfo}} in trunk/2.9/2.8 already contains {{reservedSpaceForReplicas}} (HDFS-6955) and {{numBlocks}} (HDFS-9425). Please verify (screenshot?) they appear on the web ui. > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-9912) Make HDFS Federation docs up to date
[ https://issues.apache.org/jira/browse/HDFS-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-9912. --- Resolution: Not A Bug While dfsclusterhealth.jsp is removed, there is a patch to add it back: HDFS-8976. Federation doc does not mention how to co-exist with HA, but the HA doc does mention the configuration needed to work with federation. So this jira is not needed. > Make HDFS Federation docs up to date > > > Key: HDFS-9912 > URL: https://issues.apache.org/jira/browse/HDFS-9912 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > > _HDFS Federation_ documentation has a few places that are out-dated: > * dfsclusterhealth.jsp is already removed > * It should mention how to configure Federation with High availability, > because the configuration appears incompatible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10440: -- Assignee: Weiwei Yang (was: WEIWEI YANG) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-10440: -- Assignee: WEIWEI YANG > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang >Assignee: WEIWEI YANG > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307905#comment-15307905 ] Kihwal Lee commented on HDFS-10440: --- Thanks for the patch. I think it is showing the available information very well. Having said that, we can take this opportunity to expose more on the block pool via jmx. The namenode addressed are useful, but showing the service actor state will be even better. Sometimes datanodes have trouble talking to some namenode, but not all. Verifying it usually involves looking at the log. Exposing individual BP service actor state through jmx and showing them through UI will be very helpful. For the storage section, {{VolumeInfo}} in trunk/2.9/2.8 already contains {{reservedSpaceForReplicas}} (HDFS-6955) and {{numBlocks}} (HDFS-9425). Please verify (screenshot?) they appear on the web ui. > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10470) HDFS HA with kerberose Specified version of key is not available
[ https://issues.apache.org/jira/browse/HDFS-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10470: -- Component/s: ha > HDFS HA with kerberose Specified version of key is not available > > > Key: HDFS-10470 > URL: https://issues.apache.org/jira/browse/HDFS-10470 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, journal-node >Affects Versions: 2.6.0 > Environment: java version "1.7.0_79" > Java(TM) SE Runtime Environment (build 1.7.0_79-b15) > Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) >Reporter: deng > > When I enable kerberose with HDFS HA, the journalnode always throw the below > exception,but the hdfs works well. > 2016-05-30 10:54:37,877 WARN > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: Failure unspecified at GSS-A > PI level (Mechanism level: Specified version of key is not available (44)) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Specified version of key > is not available (44)) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:517) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1279) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Specified version of key is not available (44)) > at > sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) > at > sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:366) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:348) > ...
[jira] [Updated] (HDFS-9908) Datanode should tolerate disk scan failure during NN handshake
[ https://issues.apache.org/jira/browse/HDFS-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9908: -- Resolution: Won't Fix Status: Resolved (was: Patch Available) This patch no longer applies after the DU refactoring. > Datanode should tolerate disk scan failure during NN handshake > -- > > Key: HDFS-9908 > URL: https://issues.apache.org/jira/browse/HDFS-9908 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.5.0 > Environment: CDH5.3.3 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9908.001.patch, HDFS-9908.002.patch, > HDFS-9908.003.patch, HDFS-9908.004.patch, HDFS-9908.005.patch, > HDFS-9908.006.patch, HDFS-9908.007.patch > > > DN may treat a disk scan failure exception as an NN handshake exception, and > this can prevent a DN to join a cluster even if most of its disks are healthy. > During NN handshake, DN initializes block pools. It will create a lock files > per disk, and then scan the volumes. However, if the scanning throws > exceptions due to disk failure, DN will think it's an exception because NN is > inconsistent with the local storage (see {{DataNode#initBlockPool}}. As a > result, it will attempt to reconnect to NN again. > However, at this point, DN has not deleted its lock files on the disks. If it > reconnects to NN again, it will think the same disks are already being used, > and then it will fail handshake again because all disks can not be used (due > to locking), and repeatedly. This will happen even if the DN has multiple > disks, and only one of them fails. The DN will not be able to connect to NN > despite just one failing disk. Note that it is possible to successfully > create a lock file on a disk, and then has error scanning the disk. > We saw this on a CDH 5.3.3 cluster (which is based on Apache Hadoop 2.5.0, > and we still see the same bug in 3.0.0 trunk branch). The root cause is that > DN treats an internal error (single disk failure) as an external one (NN > handshake failure) and we should fix it. > {code:title=DataNode.java} > /** >* One of the Block Pools has successfully connected to its NN. >* This initializes the local storage for that block pool, >* checks consistency of the NN's cluster ID, etc. >* >* If this is the first block pool to register, this also initializes >* the datanode-scoped storage. >* >* @param bpos Block pool offer service >* @throws IOException if the NN is inconsistent with the local storage. >*/ > void initBlockPool(BPOfferService bpos) throws IOException { > NamespaceInfo nsInfo = bpos.getNamespaceInfo(); > if (nsInfo == null) { > throw new IOException("NamespaceInfo not found: Block pool " + bpos > + " should have retrieved namespace info before initBlockPool."); > } > > setClusterId(nsInfo.clusterID, nsInfo.getBlockPoolID()); > // Register the new block pool with the BP manager. > blockPoolManager.addBlockPool(bpos); > > // In the case that this is the first block pool to connect, initialize > // the dataset, block scanners, etc. > initStorage(nsInfo); > // Exclude failed disks before initializing the block pools to avoid > startup > // failures. > checkDiskError(); > data.addBlockPool(nsInfo.getBlockPoolID(), conf); <- this line > throws disk error exception > blockScanner.enableBlockPoolId(bpos.getBlockPoolId()); > initDirectoryScanner(conf); > } > {code} > {{FsVolumeList#addBlockPool}} is the source of exception. > {code:title=FsVolumeList.java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > > final List exceptions = Collections.synchronizedList( > new ArrayList()); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (ClosedChannelException e) { > // ignore. > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); >
[jira] [Updated] (HDFS-10470) HDFS HA with kerberose Specified version of key is not available
[ https://issues.apache.org/jira/browse/HDFS-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10470: -- Issue Type: Bug (was: New Feature) > HDFS HA with kerberose Specified version of key is not available > > > Key: HDFS-10470 > URL: https://issues.apache.org/jira/browse/HDFS-10470 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, journal-node >Affects Versions: 2.6.0 > Environment: java version "1.7.0_79" > Java(TM) SE Runtime Environment (build 1.7.0_79-b15) > Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) >Reporter: deng > > When I enable kerberose with HDFS HA, the journalnode always throw the below > exception,but the hdfs works well. > 2016-05-30 10:54:37,877 WARN > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: Failure unspecified at GSS-A > PI level (Mechanism level: Specified version of key is not available (44)) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Specified version of key > is not available (44)) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:517) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1279) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Specified version of key is not available (44)) > at > sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) > at > sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:366) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:348) >
[jira] [Updated] (HDFS-10470) HDFS HA with kerberose Specified version of key is not available
[ https://issues.apache.org/jira/browse/HDFS-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-10470: -- Component/s: (was: datanode) journal-node > HDFS HA with kerberose Specified version of key is not available > > > Key: HDFS-10470 > URL: https://issues.apache.org/jira/browse/HDFS-10470 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ha, journal-node >Affects Versions: 2.6.0 > Environment: java version "1.7.0_79" > Java(TM) SE Runtime Environment (build 1.7.0_79-b15) > Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode) >Reporter: deng > > When I enable kerberose with HDFS HA, the journalnode always throw the below > exception,but the hdfs works well. > 2016-05-30 10:54:37,877 WARN > org.apache.hadoop.security.authentication.server.AuthenticationFilter: > Authentication exception: GSSException: Failure unspecified at GSS-A > PI level (Mechanism level: Specified version of key is not available (44)) > org.apache.hadoop.security.authentication.client.AuthenticationException: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Specified version of key > is not available (44)) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:517) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1279) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Specified version of key is not available (44)) > at > sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) > at > sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) > at > sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:366) > at > org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:348) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at >
[jira] [Commented] (HDFS-10471) DFSAdmin#SetQuotaCommand's help msg is not correct
[ https://issues.apache.org/jira/browse/HDFS-10471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307887#comment-15307887 ] Rushabh S Shah commented on HDFS-10471: --- +1 ltgm (non-binding) > DFSAdmin#SetQuotaCommand's help msg is not correct > -- > > Key: HDFS-10471 > URL: https://issues.apache.org/jira/browse/HDFS-10471 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Attachments: HDFS-10471.001.patch > > > The help message of the command that related with SetQuota is not show > correct. In message, the name {{quota}} was showed as {{N}}. The {{N}} was > not appeared before. > {noformat} > -setQuota ...: Set the quota for each > directory . > The directory quota is a long integer that puts a hard limit > on the number of names in the directory tree > For each directory, attempt to set the quota. An error will be > reported if > 1. N is not a positive integer, or > 2. User is not an administrator, or > 3. The directory does not exist or is a file. > Note: A quota of 1 would force the directory to remain empty. > {noformat} > The command {{-setSpaceQuota}} also has similar problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10367) TestDFSShell.testMoveWithTargetPortEmpty fails with Address bind exception.
[ https://issues.apache.org/jira/browse/HDFS-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-10367: Attachment: HDFS-10367-005.patch Uploaded the patch to address the above comment.. > TestDFSShell.testMoveWithTargetPortEmpty fails with Address bind exception. > --- > > Key: HDFS-10367 > URL: https://issues.apache.org/jira/browse/HDFS-10367 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Attachments: HDFS-10367-002.patch, HDFS-10367-003.patch, > HDFS-10367-004.patch, HDFS-10367-005.patch, HDFS-10367.patch > > > {noformat} > Problem binding to [localhost:9820] java.net.BindException: Address already > in use; For more details see: http://wiki.apache.org/hadoop/BindException > Stack Trace: > java.net.BindException: Problem binding to [localhost:9820] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:444) > at sun.nio.ch.Net.bind(Net.java:436) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at org.apache.hadoop.ipc.Server.bind(Server.java:530) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:793) > at org.apache.hadoop.ipc.Server.(Server.java:2592) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:958) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:563) > at > org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:538) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:800) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:426) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:783) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:924) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:903) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1620) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:891) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:823) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:482) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:441) > at > org.apache.hadoop.hdfs.TestDFSShell.testMoveWithTargetPortEmpty(TestDFSShell.java:567) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307519#comment-15307519 ] Kai Zheng commented on HDFS-9833: - The latest patch LGTM and +1. Will commit it tomorrow. > Erasure coding: recomputing block checksum on the fly by reconstructing the > missed/corrupt block data > - > > Key: HDFS-9833 > URL: https://issues.apache.org/jira/browse/HDFS-9833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch, > HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch, > HDFS-9833-05.patch, HDFS-9833-06.patch, HDFS-9833-07.patch, HDFS-9833-08.patch > > > As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum > even some of striped blocks are missed, we need to consider recomputing block > checksum on the fly for the missed/corrupt blocks. To recompute the block > checksum, the block data needs to be reconstructed by erasure decoding, and > the main needed codes for the block reconstruction could be borrowed from > HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC > worker, reconstructed blocks need to be written out to target datanodes, but > here in this case, the remote writing isn't necessary, as the reconstructed > block data is only used to recompute the checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307427#comment-15307427 ] Rakesh R commented on HDFS-9833: Test case failures {{TestRollingUpgrade.testRollback}} and {{TestEditLog.testBatchedSyncWithClosedLogs}} are not related to my patch, pls ignore it. > Erasure coding: recomputing block checksum on the fly by reconstructing the > missed/corrupt block data > - > > Key: HDFS-9833 > URL: https://issues.apache.org/jira/browse/HDFS-9833 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rakesh R > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-9833-00-draft.patch, HDFS-9833-01.patch, > HDFS-9833-02.patch, HDFS-9833-03.patch, HDFS-9833-04.patch, > HDFS-9833-05.patch, HDFS-9833-06.patch, HDFS-9833-07.patch, HDFS-9833-08.patch > > > As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum > even some of striped blocks are missed, we need to consider recomputing block > checksum on the fly for the missed/corrupt blocks. To recompute the block > checksum, the block data needs to be reconstructed by erasure decoding, and > the main needed codes for the block reconstruction could be borrowed from > HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC > worker, reconstructed blocks need to be written out to target datanodes, but > here in this case, the remote writing isn't necessary, as the reconstructed > block data is only used to recompute the checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data
[ https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307410#comment-15307410 ] Hadoop QA commented on HDFS-9833: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 58s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 54s {color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 56s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 104m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807038/HDFS-9833-08.patch | | JIRA Issue | HDFS-9833 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc | | uname | Linux 69937bcccfc9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93d8a7f | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15612/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HDFS-Build/15612/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307405#comment-15307405 ] Hadoop QA commented on HDFS-10440: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 0m 53s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807072/HDFS-10440.001.patch | | JIRA Issue | HDFS-10440 | | Optional Tests | asflicense | | uname | Linux 6cce37dd588b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 93d8a7f | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15614/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Attachment: HDFS-10440.001.patch > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Status: Patch Available (was: Open) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.0, 2.6.0, 2.5.0 >Reporter: Weiwei Yang > Attachments: HDFS-10440.001.patch, datanode_html.001.jpg, > datanode_utilities.001.jpg, dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307402#comment-15307402 ] Weiwei Yang commented on HDFS-10440: I have a patch ready to add datanode UI with basic information, including block pools and storage. Please check [#datanode_html.001.jpg] and [#datanode_utilities.001.jpg]. The patch can be applied to both trunk and branch-2. This patch is created based on existing datanode JMX, I think it's a good place to start. > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: datanode_html.001.jpg, datanode_utilities.001.jpg, > dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Attachment: datanode_utilities.001.jpg datanode_html.001.jpg > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: datanode_html.001.jpg, datanode_utilities.001.jpg, > dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10440) Improve DataNode web UI
[ https://issues.apache.org/jira/browse/HDFS-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-10440: --- Attachment: (was: dn_UI_logs.jpg) > Improve DataNode web UI > --- > > Key: HDFS-10440 > URL: https://issues.apache.org/jira/browse/HDFS-10440 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Weiwei Yang > Attachments: datanode_html.001.jpg, datanode_utilities.001.jpg, > dn_web_ui_mockup.jpg > > > At present, datanode web UI doesn't have much information except for node > name and port. Propose to add more information similar to namenode UI, > including, > * Static info (version, block pool and cluster ID) > * Running state (active, decommissioning, decommissioned or lost etc) > * Summary (blocks, capacity, storage etc) > * Utilities (logs) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307335#comment-15307335 ] Vinayakumar B edited comment on HDFS-10472 at 5/31/16 7:24 AM: --- Hi [~chenfolin], good find. 1. As there is no special logic handled for IOE, You can replace existing {{ catch (IOException ex) {}} itself with {{ catch (Throwable ex) {}}. Also note that, Hadoop uses 2 spaces as indentation, instead of 4. was (Author: vinayrpet): Hi [~chenfolin], good find. 1. As there is no special login handled for IOE, You can replace existing {{ catch (IOException ex) {}} itself with {{ catch (Throwable ex) {}}. Also note that, Hadoop uses 2 spaces as indentation, instead of 4. > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307335#comment-15307335 ] Vinayakumar B edited comment on HDFS-10472 at 5/31/16 7:24 AM: --- Hi [~chenfolin], good find. 1. As there is no special logic handled for IOE, You can replace existing {{catch (IOException ex) {}} itself with {{catch (Throwable ex) {}}. Also note that, Hadoop uses 2 spaces as indentation, instead of 4. was (Author: vinayrpet): Hi [~chenfolin], good find. 1. As there is no special logic handled for IOE, You can replace existing {{ catch (IOException ex) {}} itself with {{ catch (Throwable ex) {}}. Also note that, Hadoop uses 2 spaces as indentation, instead of 4. > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307335#comment-15307335 ] Vinayakumar B commented on HDFS-10472: -- Hi [~chenfolin], good find. 1. As there is no special login handled for IOE, You can replace existing {{ catch (IOException ex) {}} itself with {{ catch (Throwable ex) {}}. Also note that, Hadoop uses 2 spaces as indentation, instead of 4. > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Attachment: HDFS-10472.patch add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.5.0, 2.6.0, 2.8.0, 2.7.2, 2.6.2, 2.6.4 >Reporter: ChenFolin > Labels: patch > Attachments: HDFS-10472.patch > > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
[ https://issues.apache.org/jira/browse/HDFS-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenFolin updated HDFS-10472: - Labels: patch (was: ) Release Note: catch throwable Status: Patch Available (was: Open) add catch throwable > NameNode Rpc Reader Thread crash, and cluster hang. > --- > > Key: HDFS-10472 > URL: https://issues.apache.org/jira/browse/HDFS-10472 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 >Reporter: ChenFolin > Labels: patch > > My Cluster hang yesterday . > Becuase the rpc server Reader threads crash. So all rpc request timeout, > include datanode hearbeat &. > We can see , the method doRunLoop just catch InterruptedException and > IOException: > while (running) { > SelectionKey key = null; > try { > // consume as many connections as currently queued to avoid > // unbridled acceptance of connections that starves the select > int size = pendingConnections.size(); > for (int i=size; i>0; i--) { > Connection conn = pendingConnections.take(); > conn.channel.register(readSelector, SelectionKey.OP_READ, conn); > } > readSelector.select(); > Iterator iter = > readSelector.selectedKeys().iterator(); > while (iter.hasNext()) { > key = iter.next(); > iter.remove(); > if (key.isValid()) { > if (key.isReadable()) { > doRead(key); > } > } > key = null; > } > } catch (InterruptedException e) { > if (running) { // unexpected -- log it > LOG.info(Thread.currentThread().getName() + " unexpectedly > interrupted", e); > } > } catch (IOException ex) { > LOG.error("Error in Reader", ex); > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10472) NameNode Rpc Reader Thread crash, and cluster hang.
ChenFolin created HDFS-10472: Summary: NameNode Rpc Reader Thread crash, and cluster hang. Key: HDFS-10472 URL: https://issues.apache.org/jira/browse/HDFS-10472 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 2.6.4, 2.6.2, 2.7.2, 2.6.0, 2.5.0, 2.8.0 Reporter: ChenFolin My Cluster hang yesterday . Becuase the rpc server Reader threads crash. So all rpc request timeout, include datanode hearbeat &. We can see , the method doRunLoop just catch InterruptedException and IOException: while (running) { SelectionKey key = null; try { // consume as many connections as currently queued to avoid // unbridled acceptance of connections that starves the select int size = pendingConnections.size(); for (int i=size; i>0; i--) { Connection conn = pendingConnections.take(); conn.channel.register(readSelector, SelectionKey.OP_READ, conn); } readSelector.select(); Iterator iter = readSelector.selectedKeys().iterator(); while (iter.hasNext()) { key = iter.next(); iter.remove(); if (key.isValid()) { if (key.isReadable()) { doRead(key); } } key = null; } } catch (InterruptedException e) { if (running) { // unexpected -- log it LOG.info(Thread.currentThread().getName() + " unexpectedly interrupted", e); } } catch (IOException ex) { LOG.error("Error in Reader", ex); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org