[jira] [Updated] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread bhji123 (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bhji123 updated HDFS-15419:
---
Description: 
When cluster is unavailable, router -> namenode communication will only retry 
once without any time interval, that is not reasonable.

For example, in my company, which has several hdfs clusters with more than 1000 
nodes, we have encountered this problem. In some cases, the cluster becomes 
unavailable briefly for about 10 or 30 seconds, at the same time, almost all 
rpc requests to router failed because router only retry once without time 
interval.

It's better for us to enhance the router retry strategy, to retry **communicate 
with NN using configurable time interval and max retry times.

 

  was:
When cluster is unavailable, router -> namenode communication will only retry 
once without any time interval, that is not reasonable.

For example, in my company, which has several hdfs clusters with more than 1000 
nodes, we have encountered this problem. In some cases, the cluster becomes 
unavailable briefly for about 10 or 30 seconds, at the same time, almost all 
rpc requests to router failed because router only retry once without time 
interval.

It's better for us to enhance the router retry strategy, to retry with 
configurable time interval and max retry times.




 


> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry 
> **communicate with NN using configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval

2020-06-18 Thread bhji123 (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bhji123 updated HDFS-15419:
---
Summary: Router should retry communicate with NN when cluster is 
unavailable using configurable time interval  (was: router retry with 
configurable time interval when cluster is unavailable)

> Router should retry communicate with NN when cluster is unavailable using 
> configurable time interval
> 
>
> Key: HDFS-15419
> URL: https://issues.apache.org/jira/browse/HDFS-15419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration, hdfs-client, rbf
>Reporter: bhji123
>Priority: Major
>
> When cluster is unavailable, router -> namenode communication will only retry 
> once without any time interval, that is not reasonable.
> For example, in my company, which has several hdfs clusters with more than 
> 1000 nodes, we have encountered this problem. In some cases, the cluster 
> becomes unavailable briefly for about 10 or 30 seconds, at the same time, 
> almost all rpc requests to router failed because router only retry once 
> without time interval.
> It's better for us to enhance the router retry strategy, to retry with 
> configurable time interval and max retry times.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org