[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17630038#comment-17630038
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith merged PR #5112:
URL: https://github.com/apache/hadoop/pull/5112




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629948#comment-17629948
 ] 

ASF GitHub Bot commented on HDFS-16827:
---

simbadzina commented on code in PR #5088:
URL: https://github.com/apache/hadoop/pull/5088#discussion_r1015741184


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##
@@ -2879,9 +2880,12 @@ private void processRpcRequest(RpcRequestHeaderProto 
header,
 stateId = alignmentContext.receiveRequestState(
 header, getMaxIdleTime());
 call.setClientStateId(stateId);
-if (header.hasRouterFederatedState()) {
-  
call.setFederatedNamespaceState(header.getRouterFederatedState());
-}
+  }
+  if (header.hasRouterFederatedState()) {
+call.setFederatedNamespaceState(header.getRouterFederatedState());
+  } else if (header.hasStateId()) {
+// Set one empty FederatedNamespaceState to identify the client 
want to get stateId.
+call.setFederatedNamespaceState(EMPTY_BYTE_STRING);

Review Comment:
   Typo "wants" instead of "want"





> [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client 
> doesn't use ObserverReadProxyProvider
> -
>
> Key: HDFS-16827
> URL: https://issues.apache.org/jira/browse/HDFS-16827
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RouterStateIdContext shouldn't update the ResponseState if client doesn't use 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629947#comment-17629947
 ] 

ASF GitHub Bot commented on HDFS-16827:
---

simbadzina commented on PR #5088:
URL: https://github.com/apache/hadoop/pull/5088#issuecomment-1305990097

   The change looks good to me. Still passes we refactored unit test in 
TestObserverWithRouter on trunk.




> [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client 
> doesn't use ObserverReadProxyProvider
> -
>
> Key: HDFS-16827
> URL: https://issues.apache.org/jira/browse/HDFS-16827
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RouterStateIdContext shouldn't update the ResponseState if client doesn't use 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629930#comment-17629930
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950656

   I'll merge this in the next hour.




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629929#comment-17629929
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950206

   Thanks for the backport @melissayou . The changes look good to me - I expect 
the deprecated method will be addressed by 
[HADOOP-18520](https://issues.apache.org/jira/browse/HADOOP-18520). The other 
failures exist in trunk as well - fixing the checkstyles will not make this a 
clean cherry-pick.




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629928#comment-17629928
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina closed pull request #4883: HDFS-13522: Add federated nameservices 
states to client protocol and propagate it between routers and clients.
URL: https://github.com/apache/hadoop/pull/4883




> HDFS-13522: Add federated nameservices states to client protocol and 
> propagate it between routers and clients.
> --
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{{}FederationNamenodeServiceState{}}}.
> This patch captures the state of all namespaces in the routers and propagates 
> it to clients. A follow up patch will change router behavior to direct 
> requests to the observer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629909#comment-17629909
 ] 

ASF GitHub Bot commented on HDFS-16821:
---

omalley merged PR #5078:
URL: https://github.com/apache/hadoop/pull/5078




> Fix regression in HDFS-13522 that enables observer reads by default.
> 
>
> Key: HDFS-16821
> URL: https://issues.apache.org/jira/browse/HDFS-16821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
>
> Serving reads consistently from Observer Namenodes is a feature that was 
> introduced in HDFS-12943.
> Clients opt-into this feature by configuring the ObserverReadProxyProvider. 
> It is important that the opt-in is explicit because for third-party reads to 
> remain consistent, these clients then need to perform an msync before reads.
> In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus 
> enabling Observer reads for all clients by default. This breaks consistency 
> guarantees for clients that haven't opted into observer reads.
> [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354]
> We need to return to the old behavior of only using the ClientGSIContext when 
> users have explicitly opted into Observer reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629910#comment-17629910
 ] 

ASF GitHub Bot commented on HDFS-16821:
---

omalley commented on PR #5078:
URL: https://github.com/apache/hadoop/pull/5078#issuecomment-1305886044

   +1 LGTM




> Fix regression in HDFS-13522 that enables observer reads by default.
> 
>
> Key: HDFS-16821
> URL: https://issues.apache.org/jira/browse/HDFS-16821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
>
> Serving reads consistently from Observer Namenodes is a feature that was 
> introduced in HDFS-12943.
> Clients opt-into this feature by configuring the ObserverReadProxyProvider. 
> It is important that the opt-in is explicit because for third-party reads to 
> remain consistent, these clients then need to perform an msync before reads.
> In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus 
> enabling Observer reads for all clients by default. This breaks consistency 
> guarantees for clients that haven't opted into observer reads.
> [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354]
> We need to return to the old behavior of only using the ClientGSIContext when 
> users have explicitly opted into Observer reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629857#comment-17629857
 ] 

ASF GitHub Bot commented on HDFS-16764:
---

hadoop-yetus commented on PR #4872:
URL: https://github.com/apache/hadoop/pull/4872#issuecomment-1305722509

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  1s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 49s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 351m 17s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 471m 51s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4872 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 033df28c4be5 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a44c5a2c2d01f68e91febdf82c9f2d4e25d53896 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/testReport/ |
   | Max. process+thread count | 1847 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> ObserverNamenode handles addBlock rpc and 

[jira] [Commented] (HDFS-16831) [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes every time

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629651#comment-17629651
 ] 

ASF GitHub Bot commented on HDFS-16831:
---

ZanderXu commented on PR #5098:
URL: https://github.com/apache/hadoop/pull/5098#issuecomment-1305225730

   > +1 LGTM after last commit. @ZanderXu - Do you think we need to add a UT ?
   
   @ashutoshcipher Sir, thanks for your review. I will add one UT to test it. 
Do you have some good ideas to test shuffling result?




> [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes 
> every time
> ---
>
> Key: HDFS-16831
> URL: https://issues.apache.org/jira/browse/HDFS-16831
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The method getNamenodesForNameserviceId in MembershipNamenodeResolver.class 
> should shuffle Observer NameNodes every time. The current logic will return 
> the cached list and will caused all of read requests are forwarding to the 
> first observer namenode. 
>  
> The related code as bellow:
> {code:java}
> @Override
> public List getNamenodesForNameserviceId(
> final String nsId, boolean listObserversFirst) throws IOException {
>   List ret = cacheNS.get(Pair.of(nsId, 
> listObserversFirst));
>   if (ret != null) {
> return ret;
>   } 
>   ...
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org