[ 
https://issues.apache.org/jira/browse/HDFS-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530818#comment-15530818
 ] 

Jing Zhao edited comment on HDFS-10897 at 9/28/16 8:43 PM:
-----------------------------------------------------------

Thanks for working on this, [~anu]. The patch looks good to me overall. Some 
comments:
# My main concern is about the current way tracking the heartbeat time for 
DataNodes. Instead of using 3 String-Long maps, I think it's better to use 
{{DatanodeInfo}} (or a simplified version) to store the latest heartbeat/report 
time. Later we still need to capture other information about DataNodes (its 
current load and state etc.) thus {{DatanodeInfo}} can be the central place to 
store all the information about a DN (just like today's HDFS). Also in this way 
we only need to maintain a single datanode map (which is more static compared 
with the current 3 maps) and most of the lock protection can be put into the 
DatanodeInfo level.
# Also with this change we can have a more fair way for heartbeat time 
calculation: for every heartbeat msg, we can update the corresponding 
datanode's latest update time before putting the heartbeat into the queue, in 
order to avoid the penalty on DN due to SCM's local latency.
# For Node state, we may want to follow the current HDFS, i.e., we need to have 
AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, 
ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on 
the latest heartbeat time thus maybe we do not need to define them as an 
explicit state (and for dead nodes we may want to directly remove it).
{code}
36       * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD,
37       * DECOMMISSIONED}
38       * <p/>
39       * HEALTHY - It is a datanode that is regularly heartbeating us.
40       *
41       * STALE - A datanode for which we have missed few heart beats.
42       *
43       * DEAD - A datanode that we have not heard from for a while.
44       *
45       * DECOMMISSIONED - Someone told us to remove this node from the 
tracking
46       * list, by calling removeNode. We will throw away this nodes info soon.
{code}
# {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like 
today's FSNamesystemMBean).
# Any reason we need a NodeManager interface?




was (Author: jingzhao):
Thanks for working on this, [~anu]. The patch looks good to me overall. Some 
comments:
# My main concern is about the current way tracking the heartbeat time for 
DataNodes. Instead of using 3 String-Long maps, I think it's better to use 
{{DatanodeInfo}} to store the latest heartbeat/report time. Later we still need 
to capture other information about DataNodes (its current load and state etc.) 
thus {{DatanodeInfo}} can be the central place to store all the information 
about a DN (just like today's HDFS). Also in this way we only need to maintain 
a single datanode map (which is more static compared with the current 3 maps) 
and most of the lock protection can be put into the DatanodeInfo level.
# Also with this change we can have a more fair way for heartbeat time 
calculation: for every heartbeat msg, we can update the corresponding 
datanode's latest update time before putting the heartbeat into the queue, in 
order to avoid the penalty on DN due to SCM's local latency.
# For Node state, we may want to follow the current HDFS, i.e., we need to have 
AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, 
ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on 
the latest heartbeat time thus maybe we do not need to define them as an 
explicit state (and for dead nodes we may want to directly remove it).
{code}
36       * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD,
37       * DECOMMISSIONED}
38       * <p/>
39       * HEALTHY - It is a datanode that is regularly heartbeating us.
40       *
41       * STALE - A datanode for which we have missed few heart beats.
42       *
43       * DEAD - A datanode that we have not heard from for a while.
44       *
45       * DECOMMISSIONED - Someone told us to remove this node from the 
tracking
46       * list, by calling removeNode. We will throw away this nodes info soon.
{code}
# {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like 
today's FSNamesystemMBean).
# Any reason we need a NodeManager interface?



> Ozone: SCM: Add NodeManager
> ---------------------------
>
>                 Key: HDFS-10897
>                 URL: https://issues.apache.org/jira/browse/HDFS-10897
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>         Attachments: HDFS-10897-HDFS-7240.001.patch, 
> HDFS-10897-HDFS-7240.002.patch, HDFS-10897-HDFS-7240.003.patch
>
>
> Add a nodeManager class that will be used by Storage Controller Manager 
> eventually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to