[jira] [Comment Edited] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140300#comment-17140300 ] bhji123 edited comment on HDFS-15419 at 6/19/20, 7:33 AM: -- Yes, router is just a proxy, but it's also a server. Clients can decide whether wait/retry or not. But not all clients are so clever, especially when there is a variety of different clients. For those not that smart clients, this pr is very useful. For those very smart clients who don't want router to retry, it's ok too because now router retry is configurable. was (Author: bhji123): Yes, router is just a proxy, and it's also a server. Clients can decide whether wait/retry or not. But not all clients are so clever, especially when there is a variety of different clients. For those not that smart clients, this pr is very useful. For those very smart clients who don't want router to retry, it's ok too because now router retry is configurable. > RBF: Router should retry communicate with NN when cluster is unavailable > using configurable time interval > - > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140300#comment-17140300 ] bhji123 commented on HDFS-15419: Yes, router is just a proxy, and it's also a server. Clients can decide whether wait/retry or not. But not all clients are so clever, especially when there is a variety of different clients. For those not that smart clients, this pr is very useful. For those very smart clients who don't want router to retry, it's ok too because now router retry is configurable. > RBF: Router should retry communicate with NN when cluster is unavailable > using configurable time interval > - > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bhji123 updated HDFS-15419: --- Comment: was deleted (was: Yes, but clients may not configured appropriately. But if router can retry too, it will be more reliable.) > RBF: Router should retry communicate with NN when cluster is unavailable > using configurable time interval > - > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140283#comment-17140283 ] bhji123 commented on HDFS-15419: Yes, but clients may not configured appropriately. But if router can retry too, it will be more reliable. > RBF: Router should retry communicate with NN when cluster is unavailable > using configurable time interval > - > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15419) RBF: Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140242#comment-17140242 ] bhji123 commented on HDFS-15419: hi, Yuxuan. In this case, if clients timeout and nn is still unavailable, then clients will retry. The difference is router will be more reliable, especially when clients not configured appropriately. > RBF: Router should retry communicate with NN when cluster is unavailable > using configurable time interval > - > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bhji123 updated HDFS-15419: --- Description: When cluster is unavailable, router -> namenode communication will only retry once without any time interval, that is not reasonable. For example, in my company, which has several hdfs clusters with more than 1000 nodes, we have encountered this problem. In some cases, the cluster becomes unavailable briefly for about 10 or 30 seconds, at the same time, almost all rpc requests to router failed because router only retry once without time interval. It's better for us to enhance the router retry strategy, to retry **communicate with NN using configurable time interval and max retry times. was: When cluster is unavailable, router -> namenode communication will only retry once without any time interval, that is not reasonable. For example, in my company, which has several hdfs clusters with more than 1000 nodes, we have encountered this problem. In some cases, the cluster becomes unavailable briefly for about 10 or 30 seconds, at the same time, almost all rpc requests to router failed because router only retry once without time interval. It's better for us to enhance the router retry strategy, to retry with configurable time interval and max retry times. > Router should retry communicate with NN when cluster is unavailable using > configurable time interval > > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry > **communicate with NN using configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15419) Router should retry communicate with NN when cluster is unavailable using configurable time interval
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bhji123 updated HDFS-15419: --- Summary: Router should retry communicate with NN when cluster is unavailable using configurable time interval (was: router retry with configurable time interval when cluster is unavailable) > Router should retry communicate with NN when cluster is unavailable using > configurable time interval > > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry with > configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15419) router retry with configurable time interval when cluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17139085#comment-17139085 ] bhji123 commented on HDFS-15419: [https://github.com/apache/hadoop/pull/2082] Here is the pr to fix this problem. > router retry with configurable time interval when cluster is unavailable > > > Key: HDFS-15419 > URL: https://issues.apache.org/jira/browse/HDFS-15419 > Project: Hadoop HDFS > Issue Type: Improvement > Components: configuration, hdfs-client, rbf >Reporter: bhji123 >Priority: Major > > When cluster is unavailable, router -> namenode communication will only retry > once without any time interval, that is not reasonable. > For example, in my company, which has several hdfs clusters with more than > 1000 nodes, we have encountered this problem. In some cases, the cluster > becomes unavailable briefly for about 10 or 30 seconds, at the same time, > almost all rpc requests to router failed because router only retry once > without time interval. > It's better for us to enhance the router retry strategy, to retry with > configurable time interval and max retry times. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15419) router retry with configurable time interval when cluster is unavailable
bhji123 created HDFS-15419: -- Summary: router retry with configurable time interval when cluster is unavailable Key: HDFS-15419 URL: https://issues.apache.org/jira/browse/HDFS-15419 Project: Hadoop HDFS Issue Type: Improvement Components: configuration, hdfs-client, rbf Reporter: bhji123 When cluster is unavailable, router -> namenode communication will only retry once without any time interval, that is not reasonable. For example, in my company, which has several hdfs clusters with more than 1000 nodes, we have encountered this problem. In some cases, the cluster becomes unavailable briefly for about 10 or 30 seconds, at the same time, almost all rpc requests to router failed because router only retry once without time interval. It's better for us to enhance the router retry strategy, to retry with configurable time interval and max retry times. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org