[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819361#comment-16819361 ] Wei-Chiu Chuang commented on HDFS-10477: Failures doesn't reproduce for me. Will commit now. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819240#comment-16819240 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2.8 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 13s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} branch-2.8 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 145 unchanged - 1 fixed = 145 total (was 146) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}198m 52s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 32s{color} | {color:red} The patch generated 197 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}239m 57s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:50 | | Failed junit tests | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.namenode.TestNameNodeRecovery | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.server.namenode.TestNameNodeRpcServer | | | hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | | | hadoop.hdfs.TestSetrepDecreasing | | | hadoop.hdfs.server.namenode.TestFSNamesystem | | | hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant | | | hadoop.hdfs.TestFSInputChecker | | | hadoop.hdfs.server.namenode.TestTransferFsImage | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.namenode.TestINodeAttributeProvider | | | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot | | | hadoop.hdfs.TestFsShellPermission | | | hadoop.hdfs.server.namenode.TestFSNamesystemLock | | | hadoop.hdfs.TestEncryptionZonesWithHA | | | hadoop.hdfs.TestDFSRename | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestFileCreationEmpty | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl | | | org.apache.hadoop.hdfs.TestBlocksScheduledCounter | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.TestSetrepIncreasing | | | org.apache.hadoop.hdfs.server.namenode.TestINodeFile | | | org.apache.hadoop.hdfs.TestDatanodeDeath | | | org.apache.hadoop.hdfs.TestDFSClientRetries | | |
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818962#comment-16818962 ] Wei-Chiu Chuang commented on HDFS-10477: Set Jira state to patch available to kick off precommit. +1 pending Jenkins. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817908#comment-16817908 ] star commented on HDFS-10477: - [~jojochuang], [^HDFS-10477.branch-2.8.patch] is for branch 2.8. Anything else should I do to make the patch committed? > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815013#comment-16815013 ] star commented on HDFS-10477: - [~jojochuang], uploaded patch for branch-2.8. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814887#comment-16814887 ] Wei-Chiu Chuang commented on HDFS-10477: Pushed up to branch-2 and branch-2.9. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814669#comment-16814669 ] Wei-Chiu Chuang commented on HDFS-10477: Failed tests not reproducible. I will commit the branch-2 patch now > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809801#comment-16809801 ] Wei-Chiu Chuang commented on HDFS-10477: Findbugs warning unrelated, caused by a previous commit. Will file a Jira to clean it up. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809343#comment-16809343 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 15s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 26s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:da67579 | | JIRA Issue | HDFS-10477 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964773/HDFS-10477.branch-2.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d2d8955079d5 4.4.0-138-generic
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809235#comment-16809235 ] Wei-Chiu Chuang commented on HDFS-10477: Here's the branch-2 patch. It partially includes HDFS-13027, and I think it makes sense to backport HDFS-13027 to branch-2 also. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809153#comment-16809153 ] Wei-Chiu Chuang commented on HDFS-10477: Pushed to trunk, branch-3.2, branch-3.1 and branch-3.0. There are some conflicts for a branch-2 commit so I'm working on that. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808982#comment-16808982 ] Wei-Chiu Chuang commented on HDFS-10477: Committing 007 soon. [~arpitagarwal] I think that approach will require some study. Full block reports is generated every few hours and our users may complain about excessive replicas not going down. On the other hand, there's some merits in delaying block invalidation. I imagine a NN could be overwhelmed by the amount of IBRs after block invalidation. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807311#comment-16807311 ] Konstantin Shvachko commented on HDFS-10477: Let's target this for 2.10 at the minimum. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807289#comment-16807289 ] Arpit Agarwal commented on HDFS-10477: -- Just a thought - it may be okay to skip this function altogether, and let the block invalidations be handled by the full block reports. That will also stagger the invalidation work. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807280#comment-16807280 ] Arpit Agarwal commented on HDFS-10477: -- +1 the patch lgtm. Thanks for fixing this [~jojochuang]. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807219#comment-16807219 ] Wei-Chiu Chuang commented on HDFS-10477: Test failures do not reproduce locally. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807109#comment-16807109 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 38s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 78m 36s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.web.TestWebHdfsTimeouts | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-10477 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964465/HDFS-10477.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 31c93d06a7f5 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 856cbf6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/26558/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/26558/testReport/ | | Max. process+thread count | 4776 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803998#comment-16803998 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 12s{color} | {color:red} HDFS-10477 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-10477 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964031/HDFS-10477.006.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26539/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803914#comment-16803914 ] Wei-Chiu Chuang commented on HDFS-10477: The conflicts come from HDFS-13027 and HDFS-9390, but fairly simple. Attached [^HDFS-10477.006.patch] to resolve conflicts and add extra check that I mentioned above. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803199#comment-16803199 ] yunjiong zhao commented on HDFS-10477: -- [~jojochuang], I don't mind, please go ahead. Thank you. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803169#comment-16803169 ] Wei-Chiu Chuang commented on HDFS-10477: I took some time to review the patch. bq. I don't recall if the BlockIterator is guaranteed to be consistent if the FsNameSystem lock is released. The lock is held when {{getBlockIterator()}} is called, so that's not a problem. I think the thread safety concern should be instead about this line {code} for (DatanodeStorageInfo datanodeStorageInfo : srcNode.getStorageInfos()) { {code} Because getStorageInfos() returns an array copy of the internal storageMap, this iteration is thread-safe. However, to prevent the rare case where a storage goes bad and got pruned, care must be exercised to double check the storage is valid. That is, make sure {{srcNode.getStorageInfo[datanodeStorageInfo.getStorageID()]}} is not null inside the for loop. bq. It would have been better for the refreshNodes RPC to start the refresh on a worker thread and complete the RPC call immediately. I assume we cannot change the RPC behavior now for backwards compatibility. This can be addressed by having a new refreshNodes() RPC, deprecate the old one and update DFSAdmin to use the new RPC. We should just update this patch to make it applicable to trunk, and commit it. [~zhaoyunjiong] do you still like to work on this, or would you mind to let me do it? Thanks. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293196#comment-16293196 ] genericqa commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-10477 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-10477 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817992/HDFS-10477.005.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/22428/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104213#comment-16104213 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-10477 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-10477 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817992/HDFS-10477.005.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20454/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423486#comment-15423486 ] Arpit Agarwal commented on HDFS-10477: -- Hi [~zhaoyunjiong], [~benoyantony], I've been thinking about this some more. I think the change will be fine, although hard-coding a 1ms sleep still feels kludgy. Is the log snippet in the description complete? ~10 seconds to process 285258 blocks sounds extremely high. If this is easily reproducible perhaps profiling will reveal inefficiencies in {{processExtraRedundancyBlocksOnReCommission}} or its callees that can be fixed instead. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404918#comment-15404918 ] Benoy Antony commented on HDFS-10477: - Didn't see Arpit's previous comments. If the recommended approach is set to dfs.namenode.fslock.fair to false, then it may be good to sleep for a millisecond between lock release and re-acquisition. The patch looks good to me , +1 > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15404912#comment-15404912 ] Benoy Antony commented on HDFS-10477: - The patch looks good. There is no need to sleep as the admin can control the lock fairness via dfs.namenode.fslock.fair. The default value is true which means that the earlier threads will acquire the lock. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378343#comment-15378343 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 75m 0s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 11s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817992/HDFS-10477.005.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ec06a4fc50cb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6cf0175 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16059/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16059/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch,
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376238#comment-15376238 ] Rakesh R commented on HDFS-10477: - It looks like test case is failing due to lock release, please check. Secondly, when catching and swallowing {{InterruptedException}}, should we call {{Thread.currentThread().interrupt()}} afterward, so that the interrupt status isn't lost. {code} java.lang.IllegalMonitorStateException: null at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:371) at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1533) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processExtraRedundancyBlocksOnReCommission(BlockManager.java:3861) at org.apache.hadoop.hdfs.server.blockmanagement.DecommissionManager.stopDecommission(DecommissionManager.java:221) at org.apache.hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy.testPlacementWithLocalRackNodesDecommissioned(TestDefaultBlockPlacementPolicy.java:117) {code} > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376069#comment-15376069 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 21s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12817821/HDFS-10477.004.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b7d6870f248a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d180505 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16052/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16052/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16052/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373817#comment-15373817 ] Arpit Agarwal commented on HDFS-10477: -- I second [~kihwal]'s concern about releasing and reacquiring. We have been recommending that _ dfs.namenode.fslock.fair_ be set to false as it gives better overall RPC throughput. I am also not sure about releasing the lock during iteration (in _processExtraRedundancyBlocksOnReCommission_). I don't recall if the BlockIterator is guaranteed to be consistent if the FsNameSystem lock is released. It would have been better for the refreshNodes RPC to start the refresh on a worker thread and complete the RPC call immediately. I assume we cannot change the RPC behavior now for backwards compatibility. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366966#comment-15366966 ] yunjiong zhao commented on HDFS-10477: -- Those failed unit test is not related to this patch. And there is no need to add new unit test for this patch since it's only add steps to release the nameSystem writeLock and then acquire the lock again. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365103#comment-15365103 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12809559/HDFS-10477.003.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux beecd3701f63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 04f6ebb | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15996/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15996/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15996/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364983#comment-15364983 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 47s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 83m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCacheRevocation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12809559/HDFS-10477.003.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7519fd0317b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 04f6ebb | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15994/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15994/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15994/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364955#comment-15364955 ] Brahma Reddy Battula commented on HDFS-10477: - FYI..I triggered..https://builds.apache.org/job/PreCommit-HDFS-Build/15996/ > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364835#comment-15364835 ] Benoy Antony commented on HDFS-10477: - Thanks for the info, Akira. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.28:1004 > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364831#comment-15364831 ] Akira Ajisaka commented on HDFS-10477: -- Hi [~benoyantony], everyone who has apache id can manually re-trigger the Jenkins precommit job by login to https://builds.apache.org/job/PreCommit-HDFS-Build/ > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364813#comment-15364813 ] Akira Ajisaka commented on HDFS-10477: -- Triggered https://builds.apache.org/job/PreCommit-HDFS-Build/15992/ > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364805#comment-15364805 ] Benoy Antony commented on HDFS-10477: - Looks good. Could you please re-trigger the build to make sure the tests pass ? > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325330#comment-15325330 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 26s {color} | {color:red} Docker failed to build yetus/hadoop:2c91fd8. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12809559/HDFS-10477.003.patch | | JIRA Issue | HDFS-10477 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15739/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325033#comment-15325033 ] Benoy Antony commented on HDFS-10477: - [~kihwal], Your comments regarding starvation makes sense. [~zhaoyunjiong], I think its a good idea to combine these two patches. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314451#comment-15314451 ] yunjiong zhao commented on HDFS-10477: -- [~kihwal], What's your opinion on the second patch? Any suggestions? Should I combine those two patches? I mean, for the second patch if no Datanode have more blocks than DFS_BLOCK_MISREPLICATION_PROCESSING_LIMIT and if there are lots of DataNodes, then still might have trouble. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314253#comment-15314253 ] Kihwal Lee commented on HDFS-10477: --- If the lock is simply released and re-acquired right away per datanode, the maximum number of requests that will be processed between the 10 second locks will be limited to something close to the number of RPC handlers, since that will be the number of threads waiting for the lock. The lock fairness setting will affect this, but the number will still be limited even if set to unfair since the lock won't be completely unfair to avoid starvation. The HA monitoring request from ZKFC will likely be served in one or two locking cycles if it is being handled by a separate RPC server (thus mostly empty callqueue), so the initial patch may indeed prevent a failover. If the goal is to simply avoid a failover, then the patch may be sufficient. But most RPC requests will timeout for many minutes. Is the HA monitoring correctly assessing the service status then? IMO, in order to avoid a failover and be able to maintain the service availability, we need to make it finer grained or insert delay between locks. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 >
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313282#comment-15313282 ] Benoy Antony commented on HDFS-10477: - If its possible to release lock per storage, then that's better. If not , I prefer the first version which does releases the lock per each datanode without the additional processing. The logs show that the each node is processed in around 10 seconds. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.002.patch, HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312946#comment-15312946 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 38s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 83m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | | hadoop.hdfs.TestAsyncHDFSWithHA | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestEncryptionZonesWithKMS | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807796/HDFS-10477.002.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b5abad8b0a8a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ead61c4 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/15635/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15635/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HDFS-Build/15635/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15635/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312438#comment-15312438 ] Kihwal Lee commented on HDFS-10477: --- It will be better if the locking is done per storage instead of per node. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated blocks on 10.142.27.14:1004 during recommissioning > 2016-05-26 20:13:25,370 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.28:1004 > 2016-05-26
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311545#comment-15311545 ] Hadoop QA commented on HDFS-10477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 9s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 87m 19s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12807561/HDFS-10477.patch | | JIRA Issue | HDFS-10477 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 15eea8fda5c9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 16b1cc7 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15630/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HDFS-Build/15630/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15630/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15630/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Stop decommission a rack of DataNodes caused NameNode fail over to standby >