[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-9434: - Fix Version/s: 2.8.0 > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1 > > Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-9434: -- Fix Version/s: 2.7.2 > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.7.2, 2.6.3 > > Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9434: -- Attachment: h9434_20151116_branch-2.6.patch h9434_20151116_branch-2.6.patch: for 2.6. > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.3 > > Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9434: -- Fix Version/s: (was: 2.7.2) 2.6.3 Done. Thanks for the suggestion. > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.6.3 > > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9434: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.2 Target Version/s: (was: 2.7.3) Status: Resolved (was: Patch Available) Thanks Xiaoyu and Mingliang for reviewing the patch. I have committed this. > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Fix For: 2.7.2 > > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9434: - Target Version/s: 2.7.3 > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9434: -- Attachment: h9434_20151116.patch h9434_20151116.patch: changes the log message to trace and makes it shorter. > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds
[ https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9434: -- Status: Patch Available (was: Open) > Recommission a datanode with 500k blocks may pause NN for 30 seconds > > > Key: HDFS-9434 > URL: https://issues.apache.org/jira/browse/HDFS-9434 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h9434_20151116.patch > > > In BlockManager, processOverReplicatedBlocksOnReCommission is called within > the namespace lock. There is a (not very useful) log message printed in > processOverReplicatedBlock. When there is a large number of blocks stored in > a storage, printing the log message for each block can pause NN to process > any other operations. We did see that it could pause NN for 30 seconds for > a storage with 500k blocks. > I suggest to change the log message to trace level as a quick fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)