[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load
[ https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544463#comment-16544463 ] xiaoli commented on HDFS-13183: --- (y) > Standby NameNode process getBlocks request to reduce Active load > > > Key: HDFS-13183 > URL: https://issues.apache.org/jira/browse/HDFS-13183 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, namenode >Affects Versions: 2.7.5, 3.1.0, 2.9.1, 2.8.4, 3.0.2 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, > HDFS-13183-trunk.003.patch > > > The performance of Active NameNode could be impact when {{Balancer}} requests > #getBlocks, since query blocks of overly full DNs performance is extremely > inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} > hold read lock for long time. In extreme case, all handlers of Active > NameNode RPC server are occupied by one reader > {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active > NameNode enter a state of false death for number of seconds even for minutes. > The similar performance concerns of Balancer have reported by HDFS-9412, > HDFS-7967, etc. > If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up > the progress of balancing and reduce performance impact to Active NameNode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10290) Move getBlocks calls to DataNode in Balancer
[ https://issues.apache.org/jira/browse/HDFS-10290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465002#comment-16465002 ] xiaoli commented on HDFS-10290: --- try:https://issues.apache.org/jira/browse/HDFS-13183 > Move getBlocks calls to DataNode in Balancer > > > Key: HDFS-10290 > URL: https://issues.apache.org/jira/browse/HDFS-10290 > Project: Hadoop HDFS > Issue Type: New Feature > Components: balancer mover >Affects Versions: 2.6.0 >Reporter: He Tianyi >Priority: Major > > In current implementation, Balancer asks NameNode for a list of blocks on > specific DataNode. This made workload of NameNode heavier, and actually it > caused NameNode flappy when average # of blocks on each DataNode reaches > 1,000,000 (NameNode heap size is 192GB, cpu: Xeon E5-2630 * 2). > Recently I investigated whether {{getBlocks}} invocation from Balancer can be > handled by DataNodes, turned out to be practical. > The only pitfall is: since DataNode has no information about other locations > of each block it possesses, some block move may fail (since target node may > already has a replica of that particular block). > I think this may be beneficial for large clusters. > Any suggestions or comments? > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike
[ https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201578#comment-16201578 ] xiaoli commented on HDFS-11384: --- The patch1 looks good!(/)(/) > Add option for balancer to disperse getBlocks calls to avoid NameNode's > rpc.CallQueueLength spike > - > > Key: HDFS-11384 > URL: https://issues.apache.org/jira/browse/HDFS-11384 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover >Affects Versions: 2.7.3 >Reporter: yunjiong zhao >Assignee: Konstantin Shvachko > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-11384-007.patch, HDFS-11384-branch-2.7.011.patch, > HDFS-11384-branch-2.8.011.patch, HDFS-11384.001.patch, HDFS-11384.002.patch, > HDFS-11384.003.patch, HDFS-11384.004.patch, HDFS-11384.005.patch, > HDFS-11384.006.patch, HDFS-11384.008.patch, HDFS-11384.009.patch, > HDFS-11384.010.patch, HDFS-11384.011.patch, balancer.day.png, > balancer.week.png > > > When running balancer on hadoop cluster which have more than 3000 Datanodes > will cause NameNode's rpc.CallQueueLength spike. We observed this situation > could cause Hbase cluster failure due to RegionServer's WAL timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org