[jira] [Commented] (HDFS-7642) NameNode should periodically log DataNode decommissioning progress

2016-10-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543411#comment-15543411
 ] 

Andrew Wang commented on HDFS-7642:
---

Thanks for working on this Sean, one meta comment and then some code-related 
ones:

What normally happens is that decom gets stuck at the end because of 
open-for-write files. So, as an operator, often what you want to know is:

* Is this datanode still making progress?
* If not, is it blocked on open-for-write files? What are these files? Which 
client is keeping these files open?

I'm not sure that adding more logging really helps with this. We already have 
logging in logBlockReplicationInfo that gives you similar status information, 
but the remaining gaps are in understanding the rate of decommissioning (which 
might be better addressed with per-DN rate metrics) and in some debug tool that 
dumps the open-for-write files for a DN and the corresponding clients who own 
the file leases (HDFS-10480 is along those lines). What do you think?

Code related:

* Can we make the new class static?
* We can use primitives (int) rather than objects (Integer) for better 
efficiency
* Recommend we change this to debug logging, decom can take hours and be done 
on 10s of nodes at a time, printing like this can be spammy
* It would also be useful to track when this node was set to "decommissioning" 
status, so you can judge the rate of progress.

> NameNode should periodically log DataNode decommissioning progress
> --
>
> Key: HDFS-7642
> URL: https://issues.apache.org/jira/browse/HDFS-7642
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Sean Mackrory
>Priority: Minor
> Attachments: HDFS-7642.001.patch
>
>
> We've see a case where the decommissioning was stuck due to some files have 
> more replicas then DNs. HDFS-5662 fixes this particular issue but there are 
> other use cases where the decommissioning process might get stuck or slow 
> down. Some monitoring / logging will help debugging those issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7642) NameNode should periodically log DataNode decommissioning progress

2016-09-30 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536954#comment-15536954
 ] 

Zhe Zhang commented on HDFS-7642:
-

[~mackrorysd] Sure! Thanks for the interest. Unassigning myself now.

> NameNode should periodically log DataNode decommissioning progress
> --
>
> Key: HDFS-7642
> URL: https://issues.apache.org/jira/browse/HDFS-7642
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> We've see a case where the decommissioning was stuck due to some files have 
> more replicas then DNs. HDFS-5662 fixes this particular issue but there are 
> other use cases where the decommissioning process might get stuck or slow 
> down. Some monitoring / logging will help debugging those issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7642) NameNode should periodically log DataNode decommissioning progress

2016-09-30 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536701#comment-15536701
 ] 

Sean Mackrory commented on HDFS-7642:
-

Hey [~zhz] - I'd like to work on this, if that's alright with you. Since it's 
been quite a few months, I assume nothing is actively in progress here?

I would probably implement this somewhere inside 
DecommissionManager.Monitor.run()

> NameNode should periodically log DataNode decommissioning progress
> --
>
> Key: HDFS-7642
> URL: https://issues.apache.org/jira/browse/HDFS-7642
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
>Priority: Minor
>
> We've see a case where the decommissioning was stuck due to some files have 
> more replicas then DNs. HDFS-5662 fixes this particular issue but there are 
> other use cases where the decommissioning process might get stuck or slow 
> down. Some monitoring / logging will help debugging those issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org