[ https://issues.apache.org/jira/browse/HDFS-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530818#comment-15530818 ]
Jing Zhao edited comment on HDFS-10897 at 9/28/16 8:43 PM: ----------------------------------------------------------- Thanks for working on this, [~anu]. The patch looks good to me overall. Some comments: # My main concern is about the current way tracking the heartbeat time for DataNodes. Instead of using 3 String-Long maps, I think it's better to use {{DatanodeInfo}} (or a simplified version) to store the latest heartbeat/report time. Later we still need to capture other information about DataNodes (its current load and state etc.) thus {{DatanodeInfo}} can be the central place to store all the information about a DN (just like today's HDFS). Also in this way we only need to maintain a single datanode map (which is more static compared with the current 3 maps) and most of the lock protection can be put into the DatanodeInfo level. # Also with this change we can have a more fair way for heartbeat time calculation: for every heartbeat msg, we can update the corresponding datanode's latest update time before putting the heartbeat into the queue, in order to avoid the penalty on DN due to SCM's local latency. # For Node state, we may want to follow the current HDFS, i.e., we need to have AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on the latest heartbeat time thus maybe we do not need to define them as an explicit state (and for dead nodes we may want to directly remove it). {code} 36 * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD, 37 * DECOMMISSIONED} 38 * <p/> 39 * HEALTHY - It is a datanode that is regularly heartbeating us. 40 * 41 * STALE - A datanode for which we have missed few heart beats. 42 * 43 * DEAD - A datanode that we have not heard from for a while. 44 * 45 * DECOMMISSIONED - Someone told us to remove this node from the tracking 46 * list, by calling removeNode. We will throw away this nodes info soon. {code} # {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like today's FSNamesystemMBean). # Any reason we need a NodeManager interface? was (Author: jingzhao): Thanks for working on this, [~anu]. The patch looks good to me overall. Some comments: # My main concern is about the current way tracking the heartbeat time for DataNodes. Instead of using 3 String-Long maps, I think it's better to use {{DatanodeInfo}} to store the latest heartbeat/report time. Later we still need to capture other information about DataNodes (its current load and state etc.) thus {{DatanodeInfo}} can be the central place to store all the information about a DN (just like today's HDFS). Also in this way we only need to maintain a single datanode map (which is more static compared with the current 3 maps) and most of the lock protection can be put into the DatanodeInfo level. # Also with this change we can have a more fair way for heartbeat time calculation: for every heartbeat msg, we can update the corresponding datanode's latest update time before putting the heartbeat into the queue, in order to avoid the penalty on DN due to SCM's local latency. # For Node state, we may want to follow the current HDFS, i.e., we need to have AdminStates which includes NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED, ENTERING_MAINTENANCE, and IN_MAINTENANCE. Stale/dead are calculated based on the latest heartbeat time thus maybe we do not need to define them as an explicit state (and for dead nodes we may want to directly remove it). {code} 36 * 4. A node can be in any of these 4 states: {HEALTHY, STALE, DEAD, 37 * DECOMMISSIONED} 38 * <p/> 39 * HEALTHY - It is a datanode that is regularly heartbeating us. 40 * 41 * STALE - A datanode for which we have missed few heart beats. 42 * 43 * DEAD - A datanode that we have not heard from for a while. 44 * 45 * DECOMMISSIONED - Someone told us to remove this node from the tracking 46 * list, by calling removeNode. We will throw away this nodes info soon. {code} # {{getNodes}}/{{getNodeCount}} can be defined in a metrics interface (like today's FSNamesystemMBean). # Any reason we need a NodeManager interface? > Ozone: SCM: Add NodeManager > --------------------------- > > Key: HDFS-10897 > URL: https://issues.apache.org/jira/browse/HDFS-10897 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Affects Versions: HDFS-7240 > Reporter: Anu Engineer > Assignee: Anu Engineer > Attachments: HDFS-10897-HDFS-7240.001.patch, > HDFS-10897-HDFS-7240.002.patch, HDFS-10897-HDFS-7240.003.patch > > > Add a nodeManager class that will be used by Storage Controller Manager > eventually. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org