[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-19 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516754#comment-16516754
 ] 

Chao Sun commented on HDFS-13687:
-

Thanks [~xkrogen] and [~shv] for the very valuable comments! Regarding 
Konstantin's comments:

1. Very good point. Sorry I didn't know that {{getServiceStatus}} requires 
super privilege. Another option might be to add another interface/protocol to 
get the active/standby state from NN, [as proposed in the original 
JIRA|https://issues.apache.org/jira/browse/HDFS-2917?focusedCommentId=13204178=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13204178].
2. Yes good point. Perhaps we can reuse {{NameNodeHAProxyFactory}} to create 
the factory needed in our case. 
3. Can you elaborate more on this?. Currently 
{{ConfiguredFailoverProxyProvider}} does assume all remote addresses are NN, 
right?

Will go back to HDFS-12976 in the mean time.

> ConfiguredFailoverProxyProvider could direct requests to SBN
> 
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-18 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516494#comment-16516494
 ] 

Konstantin Shvachko commented on HDFS-13687:


Was looking at your patch, [~csun]. The problem looks straightforward, but the 
solution may be tricky here. Some thoughts.
# We cannot just call {{getServiceStatus()}} from 
{{ConfiguredFailoverProxyProvider}}
#* Because it requires superuser privileges on NN, which the client doesn't 
generally have in case {{T = ClientProtocol}}
#* For some protocols like {{T = DatanodeProtocol}} we actually don't care if 
the proxy connects to Active, because in the end DNs should report to all NNs 
regardless of the state.
# If the proxy needs the HAState it should be doing something like 
{{FailoverController}}. Particularly go through a factory to create a proxy, 
rather than directly instantiating 
{{HAServiceProtocolClientSideTranslatorPB()}}.
# Current implementation of {{ConfiguredFailoverProxyProvider}} doesn't have a 
notion of dealing with NNs. It probably should, or the factory for it should. 
So that we were able to add NameNode specific logic.

Anyways, this will need some thinking, I don't see a solution yet. In the mean 
time, can we return back to HDFS-12976 and see what we can do there?

> ConfiguredFailoverProxyProvider could direct requests to SBN
> 
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-18 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515986#comment-16515986
 ] 

Erik Krogen commented on HDFS-13687:


Hey [~csun], thanks for working on this valuable change. I have a few comments:
* Minor nit, I feel "get-service-state.timeout.millis" would be better for the 
config key. But if you prefer the current version I am fine with it. I am also 
wondering if "ms" may be better than "millis" - is there any consistency in 
other configs?
* The comment for {{incrementProxyIndex}} has some grammatical issues, it 
should read "Keep trying until all other NameNodes have been exhausted. When 
that happens, let the retry policy decide how to sleep and retry."
* This still leaves us vulnerable to the situation when we are already locked 
on an active, and then it becomes standby. I don't think there's any good way 
for us to solve that here.
* Your current strategy of cycling through the NameNodes once means that, if 
all NameNodes are currently in STANDBY, you will leave {{currentProxyIndex}} as 
the same node that initially triggered the failover. So despite this new logic, 
we may still fail over to a STANDBY (the same one we started with) and start 
reading from it. Given that the bullet above means we can't guarantee we're 
reading from an active anyway, I think this is probably okay, but want to 
mention it. We just need to be aware that even with this patch, avoiding 
reading from a standby is best effort, not guaranteed.
* I don't think the {{LOG}} in {{isState}} should be an error. For example, if 
one NameNode is currently unavailable (e.g. restarting), that is not really an 
error. It should probably be a warn.

> ConfiguredFailoverProxyProvider could direct requests to SBN
> 
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-16 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514963#comment-16514963
 ] 

Chao Sun commented on HDFS-13687:
-

cc [~shv], [~xkrogen].

> ConfiguredFailoverProxyProvider could direct requests to SBN
> 
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-15 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514607#comment-16514607
 ] 

genericqa commented on HDFS-13687:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
43s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
39s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
31s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}197m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13687 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928049/HDFS-13687.000.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux f7a4df583c34