[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14624:
---
Fix Version/s: 3.2.2
   3.1.4

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14624.001.patch, HDFS-14624.002.patch, 
> HDFS-14624.003.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-07-11 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-14624:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14624.001.patch, HDFS-14624.002.patch, 
> HDFS-14624.003.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-07-11 Thread Stephen O'Donnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14624:
-
Attachment: HDFS-14624.003.patch

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14624.001.patch, HDFS-14624.002.patch, 
> HDFS-14624.003.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-07-04 Thread Stephen O'Donnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14624:
-
Attachment: HDFS-14624.002.patch

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14624.001.patch, HDFS-14624.002.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-07-02 Thread Stephen O'Donnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14624:
-
Status: Patch Available  (was: Open)

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14624.001.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-07-02 Thread Stephen O'Donnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-14624:
-
Attachment: HDFS-14624.001.patch

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14624.001.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org