[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990059#comment-16990059 ] Chao Sun commented on HDFS-15024: - {quote} Chao Sun I think the msync case is just a case, maybe the current problem is a common problem for Support more than 2 NameNodes? {quote} yes you are correct. This is a more general problem for multi-sbn feature but I think we could optimize {{msync}} specifically to avoid the retry backoff. Regarding patch v1, seems it only handles the first few retries and later on when {{times}} gradually increment to passes beyond {{numNameNodes - 1 }}, it will still do exponential backoff on all the SBNs. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989406#comment-16989406 ] huhaiyang commented on HDFS-15024: -- {quote} 2. but what if accessing Active fails due to network exception, then maybe we keep the current behavior of sleep and retry, hoping the sleeps get around temporary network issue. {quote} [~vagarychen] current v01 patch is still in favor of this situation. all configured NN nodes need to be connected at once, If the NN node is not found later (due to network problems) then the next time (i.e. the fourth connection) sleep and retry is required. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989388#comment-16989388 ] huhaiyang commented on HDFS-15024: -- {quote} For the msync case, I'm wondering whether we can do something special, such as retrying all the namenode proxies in a tight loop similar to how we handle observer read calls, rather than defer that to the retry policies. {quote} [~csun] I think the msync case is just a case, maybe the current problem is a common problem for Support more than 2 NameNodes? Pls let me know whether it is correct. Thanks > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989387#comment-16989387 ] huhaiyang commented on HDFS-15024: -- {quote} This seems fairly closely related to HDFS-14969... {quote} [~xkrogen] Yes, the two issues are similar. Or do we need to provide a general solution ? > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989384#comment-16989384 ] huhaiyang commented on HDFS-15024: -- [~weichiu] Thanks for your comments! {quote} HDFS has a feature called "DataNode hedged read" where a parallel connection is sent out to other DataNodes if the read request to a DataNode does not return within a configured time. Perhaps something similar along the line can work for Multiple NameNodes setup. {quote} good idea, I will look the feature "DataNode hedged read" > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988523#comment-16988523 ] Wei-Chiu Chuang commented on HDFS-15024: bq. I think a general thing here is that maybe we should re-consider what "retry" really means under context of SbN read. Just throwing out other ideas loud. HDFS has a feature called "DataNode hedged read" where a parallel connection is sent out to other DataNodes if the read request to a DataNode does not return within a configured time. Perhaps something similar along the line can work for Multiple NameNodes setup. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Assignee: huhaiyang >Priority: Major > Labels: multi-sbnn > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987911#comment-16987911 ] Xieming Li commented on HDFS-15024: --- Is it a bad idea to retry immediately on StandbyException? > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986891#comment-16986891 ] huhaiyang commented on HDFS-15024: -- In RetryPolicies Implemented /** * @return 0 if this is our first failover/retry (i.e., retry immediately), * sleep exponentially otherwise */ private long getFailoverOrRetrySleepTime(int times) { return times == 0 ? 0 : calculateExponentialTime(delayMillis, times, maxDelayBase); } > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986882#comment-16986882 ] huhaiyang commented on HDFS-15024: -- [~xkrogen][~csun][~vagarychen] Thanks for your comments! I understand Normally, if we only set 2 NNS, dfs.ha.namenodes.ns1 nn1,nn2 Currently, nn1 is in active state nn2 is in standby state when the client connects to nn2, it needs to retry, and will quickly connect to nn1. However, when the nn1 fails to connect due to network problems, the next time( the third time), the sleep timeout will be performed for a period of time to retry Current HDFS-6440 Support more than 2 NameNodes. if we set 3 NNS, dfs.ha.namenodes.ns1 nn1,nn2,nn3 nn1 is in active state nn2 is in standby state nn3 is in standby state(or observer state) when the client connects to nn1, it needs to retry, and will quickly connect to nn2. and the client connects to nn2, it needs to retry, and will quickly connect to nn1. However, when the nn1 fails to connect due to network problems, the next time( the fourth time), the sleep timeout will be performed for a period of time to retry. That is to say, it is necessary to connect all the configured NN nodes once. If no NN nodes the requirements are found, required to perform sleep and retry... In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime , I think that the Number of NameNodes as a condition of calculation of sleep time is more reasonable((which is current v01 patch)). > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986478#comment-16986478 ] Chen Liang commented on HDFS-15024: --- Looks like v001 patch changes sleep time to 0 when there are multiple NNs. I guess the downside is that if it is actually a network exception, the client may exhaust all the retries too soon, due to 0 internal between retries. I don't think this really makes a big difference though. Following [~xkrogen] and [~csun]'s comments, I think a general thing here is that maybe we should re-consider what "retry" really means under context of SbN read. Specifically under this Jira: 1. if reading from standby/observer failed, does it worth sleep then retry (e.g. network exception)? or we just hope next NN solves the problem, so failover to next NN immediately (which is current v01 patch) 2. but what if accessing Active fails due to network exception, then maybe we keep the current behavior of sleep and retry, hoping the sleeps get around temporary network issue. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986250#comment-16986250 ] Chao Sun commented on HDFS-15024: - For the {{msync}} case, I'm wondering whether we can do something special, such as retrying all the namenode proxies in a tight loop similar to how we handle observer read calls, rather than defer that to the retry policies. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986171#comment-16986171 ] Erik Krogen commented on HDFS-15024: This seems fairly closely related to HDFS-14969. Both issues are caused because of assumptions in the code that only two NNs will be present in the conf, even though that hasn't been true since HDFS-6440 was committed. HDFS-14969 isn't as impactful since it's just logging, but It seems that the solution (expose a way to find how many proxies there are, and use that value instead of hard-coded 2, as described in [this comment|https://issues.apache.org/jira/browse/HDFS-14969?focusedCommentId=16972962&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16972962]) is similar. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985788#comment-16985788 ] huhaiyang commented on HDFS-15024: -- dfs.ha.namenodes.ns1 nn1,nn2,nn3 Currently, nn1 is in active state nn2 is in standby state nn3 is in observer state ./bin/hadoop --loglevel debug fs -mkdir /user/haiyang1/test8 19/12/02 11:06:04 DEBUG ipc.Client: The ping interval is 6 ms. 19/12/02 11:06:04 DEBUG ipc.Client: Connecting to nn2/xx:8020 19/12/02 11:06:04 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop: starting, having connections 1 19/12/02 11:06:04 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.msync 19/12/02 11:06:04 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop got value #0 19/12/02 11:06:04 DEBUG retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815) , while invoking $Proxy4.getFileInfo over [nn2/xx:8020,nn1/xx:8020,nn3/xx:8020]. Trying to failover immediately. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1543) at org.apache.hadoop.ipc.Client.call(Client.java:1489) at org.apache.hadoop.ipc.Client.call(Client.java:1388) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1958) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:318) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$500(ObserverRea
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985785#comment-16985785 ] huhaiyang commented on HDFS-15024: -- [~vagarychen] Thank you for reply! I understand because getting active nn is a random state. When we tested it,when the order of NNs in ha conf is active -> standby -> observer ,that will happen random again. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985190#comment-16985190 ] Chen Liang commented on HDFS-15024: --- Thanks for reporting [~haiyang Hu]! This is an interesting issue. If I understand correctly, this only happens when the order of NNs in ha conf is standby -> observer -> active, is this correct? Because in our practice, we have been setting observer the last one in the conf, so we haven't ran into this issue. But we should definitely resolve the issue. And thanks for uploading a patch, I will take a deeper look. > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984776#comment-16984776 ] huhaiyang commented on HDFS-15024: -- [~weichiu] [~vagarychen] Hello, how do you solve this problem in the process of use? Thanks! > [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a > condition of calculation of sleep time > --- > > Key: HDFS-15024 > URL: https://issues.apache.org/jira/browse/HDFS-15024 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.10.0, 3.3.0, 3.2.1 >Reporter: huhaiyang >Priority: Major > Attachments: HDFS-15024.001.patch, client_error.log > > > {code:java} > When we enable the ONN , there will be three NN nodes for the client > configuration, > Such as configuration > > dfs.ha.namenodes.ns1 > nn2,nn3,nn1 > > Currently, > nn2 is in standby state > nn3 is in observer state > nn1 is in active state > When the user performs an access HDFS operation > ./bin/hadoop --loglevel debug fs > -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider > -mkdir /user/haiyang1/test8 > You need to request nn1 when you execute the msync method, > Actually connect nn2 first and failover is required > In connection nn3 does not meet the requirements, failover needs to be > performed, but at this time, failover operation needs to be performed during > a period of hibernation > Finally, it took a period of hibernation to connect the successful request to > nn1 > In FailoverOnNetworkExceptionRetry getFailoverOrRetrySleepTime The current > default implementation is Sleep time is calculated when more than one > failover operation is performed > I think that the Number of NameNodes as a condition of calculation of sleep > time is more reasonable > That is, in the current test, executing failover on connection nn3 does not > need to sleep time to directly connect to the next nn node > See client_error.log for details > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15024) [SBN read] In FailoverOnNetworkExceptionRetry , Number of NameNodes as a condition of calculation of sleep time
[ https://issues.apache.org/jira/browse/HDFS-15024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984772#comment-16984772 ] huhaiyang commented on HDFS-15024: -- ./bin/hadoop --loglevel debug fs -Ddfs.client.failover.proxy.provider.ns1=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /user/haiyang1/test8 ... 19/11/29 14:26:55 DEBUG ipc.Client: The ping interval is 6 ms. 19/11/29 14:26:55 DEBUG ipc.Client: Connecting to nn2/xx:8020 19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop: starting, having connections 1 19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.msync 19/11/29 14:26:55 DEBUG ipc.Client: IPC Client (1337335626) connection to nn2/xx:8020 from hadoop got value #0 19/11/29 14:26:55 DEBUG retry.RetryInvocationHandler: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815) , while invoking $Proxy4.getFileInfo over [nn3/xx:8020,nn2/xx:8020,nn1/xx:8020]. Trying to failover immediately. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby. Visit https://s.apache.org/sbnn-error at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2018) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1461) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.msync(NameNodeRpcServer.java:1384) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.msync(ClientNamenodeProtocolServerSideTranslatorPB.java:1907) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1903) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2815) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1543) at org.apache.hadoop.ipc.Client.call(Client.java:1489) at org.apache.hadoop.ipc.Client.call(Client.java:1388) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy15.msync(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.msync(ClientNamenodeProtocolTranslatorPB.java:1958) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.initializeMsync(ObserverReadProxyProvider.java:318) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.access$500(ObserverReadProx