[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-9434:
-
Fix Version/s: 2.8.0

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.8.0, 2.7.2, 2.6.3, 3.0.0-alpha1
>
> Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-12-23 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-9434:
--
Fix Version/s: 2.7.2

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.7.2, 2.6.3
>
> Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9434:
--
Attachment: h9434_20151116_branch-2.6.patch

h9434_20151116_branch-2.6.patch: for 2.6.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.3
>
> Attachments: h9434_20151116.patch, h9434_20151116_branch-2.6.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-24 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9434:
--
Fix Version/s: (was: 2.7.2)
   2.6.3

Done.  Thanks for the suggestion.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.3
>
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-19 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9434:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.7.2
Target Version/s:   (was: 2.7.3)
  Status: Resolved  (was: Patch Available)

Thanks Xiaoyu and Mingliang for reviewing the patch.

I have committed this.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.7.2
>
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-17 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9434:
-
Target Version/s: 2.7.3

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-16 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9434:
--
Attachment: h9434_20151116.patch

h9434_20151116.patch: changes the log message to trace and makes it shorter.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-16 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9434:
--
Status: Patch Available  (was: Open)

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)