[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2023-01-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679883#comment-17679883
 ] 

Wei-Chiu Chuang commented on HDFS-15383:


HADOOP-18519 backported HADOOP-17835 to branch-3.3. Update the fix version 
accordingly.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17630038#comment-17630038
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith merged PR #5112:
URL: https://github.com/apache/hadoop/pull/5112




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629930#comment-17629930
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950656

   I'll merge this in the next hour.




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629929#comment-17629929
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

virajith commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950206

   Thanks for the backport @melissayou . The changes look good to me - I expect 
the deprecated method will be addressed by 
[HADOOP-18520](https://issues.apache.org/jira/browse/HADOOP-18520). The other 
failures exist in trunk as well - fixing the checkstyles will not make this a 
clean cherry-pick.




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629421#comment-17629421
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

hadoop-yetus commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1304731410

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  1s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  8s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m 43s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |  18m  9s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   3m  8s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   4m  8s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   3m 47s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  26m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 33s |  |  the patch passed  |
   | -1 :x: |  javac  |  17m 33s | 
[/results-compile-javac-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/2/artifact/out/results-compile-javac-root.txt)
 |  root generated 2 new + 1871 unchanged - 11 fixed = 1873 total (was 1882)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   2m 53s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 18 unchanged - 26 fixed = 21 total (was 
44)  |
   | +1 :green_heart: |  mvnsite  |   4m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 48s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   6m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 27s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 25s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 56s |  |  hadoop-registry in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  20m 19s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 211m 35s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5112 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 18e8f3c6eac5 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / a989b86be03e31f48f0e8b8b08e177eb2029a0f9 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/2/testReport/ |
   | Max. process+thread count | 2060 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-common-project/hadoop-registry hadoop-hdfs-project/hadoop-hdfs-rbf U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/2/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: 

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629406#comment-17629406
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

hadoop-yetus commented on PR #5112:
URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1304702676

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  10m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  3s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m  9s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |  18m  1s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   3m 21s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   4m 15s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   3m 48s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   5m 47s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  26m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 23s |  |  the patch passed  |
   | -1 :x: |  javac  |  17m 23s | 
[/results-compile-javac-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/artifact/out/results-compile-javac-root.txt)
 |  root generated 2 new + 1871 unchanged - 11 fixed = 1873 total (was 1882)  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/artifact/out/blanks-eol.txt)
 |  The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   3m  0s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 18 unchanged - 26 fixed = 21 total (was 
44)  |
   | +1 :green_heart: |  mvnsite  |   4m  6s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   6m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m  8s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 43s |  |  hadoop-registry in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  18m 12s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 19s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 216m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5112 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux ed203fb1fc91 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / d6a246deda6fdbe1bddce94558ed701734b806b5 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/testReport/ |
   | Max. process+thread count | 2053 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-common-project/hadoop-registry hadoop-hdfs-project/hadoop-hdfs-rbf U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5112/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: Disable watch in 

[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2022-11-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629396#comment-17629396
 ] 

ASF GitHub Bot commented on HDFS-15383:
---

melissayou opened a new pull request, #5112:
URL: https://github.com/apache/hadoop/pull/5112

   
   
   ### Description of PR
   As part of 
[HADOOP-18518](https://issues.apache.org/jira/browse/HADOOP-18518), this is the 
second PR to fix deprecation of PathChildrenCache/TreeCache in new ZK. It's a 
clean cherry pick from HDFS-15383 and HADOOP-17835. 
   
   ### How was this patch tested?
`mvn clean install -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip` build 
success
   
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-10 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247113#comment-17247113
 ] 

Yuxuan Wang commented on HDFS-15383:


[~fengnanli] Thx for your reply. Very clear explanation.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247035#comment-17247035
 ] 

Fengnan Li commented on HDFS-15383:
---

[~John Smith] It is a good question.
First of all, when the token is stale it will be deleted by the clean up 
thread, thus when a client access this Router with a renewed token this Router 
would not recognize it thus will load from ZK. The default scan interval is 1h, 
which is long.
On the other hand, clients normally renew a token before it expires. For 
example, Yarn renews a token when it reaches 92% (configurable, I forgot the 
exact value) of the renew date, meaning when the client renews token, there are 
still over 1 hour left for the token to be effective. Internally we set our 
sync interval as 10min, so all Routers will be able to get the new renew date 
in around 10min. In the meanwhile this is still a valid token, though there may 
be different renew date on different Routers. 
10 minutes is time for loading 1M tokens from zk to router memory in our env.
So theoretically your client will fail if you set the sync interval to be a 
very large value like 2 hours, but we don't use such a big value in this poll 
model. We can also make the deletion period shorter like every 15 mins to 
further prevent the auth failures.
Hope it makes sense.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-12-09 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246982#comment-17246982
 ] 

Yuxuan Wang commented on HDFS-15383:


Hi ~ [~fengnanli][~elgoiri][~hexiaoqiao]

After disable watcher, tokens in router memory can be stale. And client may 
auth failed if the token is renewed but router don't rebuild cache yet.

Or there is some misunderstand in my mind? Plz figure out, Thx!

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143430#comment-17143430
 ] 

Fengnan Li commented on HDFS-15383:
---

Thanks! [~elgoiri] [~hexiaoqiao]

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143278#comment-17143278
 ] 

Hudson commented on HDFS-15383:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18377 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18377/])
HDFS-15383. RBF: Add support for router delegation token without watch (github: 
rev 84110d850e2bc2a9ff4afcc7508fecd81cb5b7e5)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/ZKDelegationTokenSecretManagerImpl.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/token/TestZKDelegationTokenSecretManagerImpl.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/token/delegation/TestZKDelegationTokenSecretManager.java


> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance

2020-06-23 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143268#comment-17143268
 ] 

Íñigo Goiri commented on HDFS-15383:


Thanks [~fengnanli] for the patch and [~hexiaoqiao] for the review.
Merged the PR.

> RBF: Disable watch in ZKDelegationSecretManager for performance
> ---
>
> Key: HDFS-15383
> URL: https://issues.apache.org/jira/browse/HDFS-15383
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Fengnan Li
>Assignee: Fengnan Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Based on the current design for delegation token in secure Router, the total 
> number of watches for tokens is the product of number of routers and number 
> of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache 
> from curator, which automatically sets the watch and ZK will push the sync 
> information to each router. There are some evaluations about the number of 
> watches in Zookeeper has negative performance impact to Zookeeper server.
> In our practice when the number of watches exceeds 1.2 Million in a single ZK 
> server there will be significant ZK performance degradation. Thus this ticket 
> is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the 
> PathChildrenCache and have Routers sync periodically from Zookeeper. This has 
> been working fine at the scale of 10 Routers with 2 million tokens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org