[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849280#comment-17849280
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129423948

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   3m 27s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/10/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   3m 36s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/10/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 1 
unchanged - 0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 29s | 
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/10/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  29m 59s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 123m 43s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6839 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 7f83cd0f118e 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 
09:17:56

[jira] [Commented] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849279#comment-17849279
 ] 

ASF GitHub Bot commented on HDFS-17536:
---

hadoop-yetus commented on PR #6844:
URL: https://github.com/apache/hadoop/pull/6844#issuecomment-2129396098

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  22m  1s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6844/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 31s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 17s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6844/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 2 
unchanged - 0 fixed = 3 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   1m 24s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6844/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  33m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  30m 46s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 136m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.hdfs.server.federation.router.RouterSafemodeService.startupTime;
 locked 50% of time  Unsynchronized access at RouterSafemodeService.java:50% of 
time  Unsynchronized access at RouterSafemodeService.java:[line 146] |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6844/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6844 |
   | Optional Tests | dupname asflicense compile javac

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849265#comment-17849265
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129282223

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   0m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 11s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/9/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 1 
unchanged - 0 fixed = 3 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 20s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/9/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   0m 52s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/9/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  19m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  30m 43s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 113m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Exceptional return value of 
java.util.concurrent.ThreadPoolExecutor.submit(Callable) ignored in 
org.apache.hadoop.hdfs.server.federation.store.driver.StateStoreDriver.handleOverwriteAndDelete(List,
 List)  At StateStoreDriver.java:ignored

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849263#comment-17849263
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129273278

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  33m 23s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/8/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   0m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 12s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/8/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 1 
unchanged - 0 fixed = 3 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/8/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   0m 52s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/8/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  26m 54s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 111m 18s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Exceptional return value of 
java.util.concurrent.ThreadPoolExecutor.submit(Callable) ignored

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849262#comment-17849262
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129257791

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/7/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 3 new + 1 
unchanged - 0 fixed = 4 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   1m 24s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/7/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  38m 39s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  36m 43s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 181m 46s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Exceptional return value of 
java.util.concurrent.ThreadPoolExecutor.submit(Callable) ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult)
  At CachedRecordStore.java:ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult)
  At CachedRecordStore.java:[line 243] |
   | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRpc |
   
   
   | Subsystem | Report/Notes

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849249#comment-17849249
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129175735

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/5/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 3 new + 1 
unchanged - 0 fixed = 4 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   1m 21s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/5/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  34m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  25m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 156m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Exceptional return value of 
java.util.concurrent.ThreadPoolExecutor.submit(Callable) ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult,
 boolean)  At CachedRecordStore.java:ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult,
 boolean)  At CachedRecordStore.java:[line 235] |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849247#comment-17849247
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129168597

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 18s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 1 
unchanged - 0 fixed = 2 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  30m 23s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 161m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6839 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 443e13f1bf14 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 
09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6146f3a1547f47a0b06594b2e74032c1532e61d8 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/4/testReport

[jira] [Updated] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17536:

Issue Type: Improvement  (was: Task)

> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF: Format safe-mode related logic and fix a race.
>  
> Both {{RouterAdminServer#enterSafeMode()}} and 
> {{RouterSafemodeService#periodicInvoke()#leave}} can change the router state 
> at the same time.
> Safe-mode change logic should be condensed into one method. And some races 
> may happen in the current implementation, such as:
>  # {{RouterAdminServer#enterSafeMode()}} set router stat to 
> {{RouterServiceState.SAFEMODE}}
>  # {{RouterSafemodeService#periodicInvoke()#leave}} got true when checking 
> {{safeMode && !isSafeModeSetManually}}
>  # {{RouterAdminServer#enterSafeMode()}} set {{safeMode}} and 
> {{isSafeModeSetManually}} to {{true}}
>  # {{RouterAdminServer#enterSafeMode()}} get {{true}} when checking safe-mode
>  # {{RouterSafemodeService#periodicInvoke()#leave}} call {{leave()}} to leave 
> safe-mode.
> This RBF is not in safe-mode and {{safeMode}} is {{{}false{}}}, but 
> {{isSafeModeSetManually}} is {{{}true{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17536:

Issue Type: Task  (was: Bug)

> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF: Format safe-mode related logic and fix a race.
>  
> Both {{RouterAdminServer#enterSafeMode()}} and 
> {{RouterSafemodeService#periodicInvoke()#leave}} can change the router state 
> at the same time.
> Safe-mode change logic should be condensed into one method. And some races 
> may happen in the current implementation, such as:
>  # {{RouterAdminServer#enterSafeMode()}} set router stat to 
> {{RouterServiceState.SAFEMODE}}
>  # {{RouterSafemodeService#periodicInvoke()#leave}} got true when checking 
> {{safeMode && !isSafeModeSetManually}}
>  # {{RouterAdminServer#enterSafeMode()}} set {{safeMode}} and 
> {{isSafeModeSetManually}} to {{true}}
>  # {{RouterAdminServer#enterSafeMode()}} get {{true}} when checking safe-mode
>  # {{RouterSafemodeService#periodicInvoke()#leave}} call {{leave()}} to leave 
> safe-mode.
> This RBF is not in safe-mode and {{safeMode}} is {{{}false{}}}, but 
> {{isSafeModeSetManually}} is {{{}true{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17536:

Issue Type: Bug  (was: Task)

> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF: Format safe-mode related logic and fix a race.
>  
> Both {{RouterAdminServer#enterSafeMode()}} and 
> {{RouterSafemodeService#periodicInvoke()#leave}} can change the router state 
> at the same time.
> Safe-mode change logic should be condensed into one method. And some races 
> may happen in the current implementation, such as:
>  # {{RouterAdminServer#enterSafeMode()}} set router stat to 
> {{RouterServiceState.SAFEMODE}}
>  # {{RouterSafemodeService#periodicInvoke()#leave}} got true when checking 
> {{safeMode && !isSafeModeSetManually}}
>  # {{RouterAdminServer#enterSafeMode()}} set {{safeMode}} and 
> {{isSafeModeSetManually}} to {{true}}
>  # {{RouterAdminServer#enterSafeMode()}} get {{true}} when checking safe-mode
>  # {{RouterSafemodeService#periodicInvoke()#leave}} call {{leave()}} to leave 
> safe-mode.
> This RBF is not in safe-mode and {{safeMode}} is {{{}false{}}}, but 
> {{isSafeModeSetManually}} is {{{}true{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17536:

Description: 
RBF: Format safe-mode related logic and fix a race.

 

Both {{RouterAdminServer#enterSafeMode()}} and 
{{RouterSafemodeService#periodicInvoke()#leave}} can change the router state at 
the same time.

Safe-mode change logic should be condensed into one method. And some races may 
happen in the current implementation, such as:
 # {{RouterAdminServer#enterSafeMode()}} set router stat to 
{{RouterServiceState.SAFEMODE}}
 # {{RouterSafemodeService#periodicInvoke()#leave}} got true when checking 
{{safeMode && !isSafeModeSetManually}}
 # {{RouterAdminServer#enterSafeMode()}} set {{safeMode}} and 
{{isSafeModeSetManually}} to {{true}}
 # {{RouterAdminServer#enterSafeMode()}} get {{true}} when checking safe-mode
 # {{RouterSafemodeService#periodicInvoke()#leave}} call {{leave()}} to leave 
safe-mode.

This RBF is not in safe-mode and {{safeMode}} is {{{}false{}}}, but 
{{isSafeModeSetManually}} is {{{}true{}}}.

  was:RBF: Format safe-mode related logic and fix a race.


> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF: Format safe-mode related logic and fix a race.
>  
> Both {{RouterAdminServer#enterSafeMode()}} and 
> {{RouterSafemodeService#periodicInvoke()#leave}} can change the router state 
> at the same time.
> Safe-mode change logic should be condensed into one method. And some races 
> may happen in the current implementation, such as:
>  # {{RouterAdminServer#enterSafeMode()}} set router stat to 
> {{RouterServiceState.SAFEMODE}}
>  # {{RouterSafemodeService#periodicInvoke()#leave}} got true when checking 
> {{safeMode && !isSafeModeSetManually}}
>  # {{RouterAdminServer#enterSafeMode()}} set {{safeMode}} and 
> {{isSafeModeSetManually}} to {{true}}
>  # {{RouterAdminServer#enterSafeMode()}} get {{true}} when checking safe-mode
>  # {{RouterSafemodeService#periodicInvoke()#leave}} call {{leave()}} to leave 
> safe-mode.
> This RBF is not in safe-mode and {{safeMode}} is {{{}false{}}}, but 
> {{isSafeModeSetManually}} is {{{}true{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17536:
--
Labels: pull-request-available  (was: )

> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF: Format safe-mode related logic and fix a race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849244#comment-17849244
 ] 

ASF GitHub Bot commented on HDFS-17536:
---

ZanderXu opened a new pull request, #6844:
URL: https://github.com/apache/hadoop/pull/6844

   Both `RouterAdminServer#enterSafeMode()` and 
`RouterSafemodeService#periodicInvoke()#leave` can change the router state at 
the same time. 
   
   Safe-mode change logic should be condensed into one method. And some races 
may happen in the current implementation, such as:
   
   1. `RouterAdminServer#enterSafeMode()` set router stat to 
`RouterServiceState.SAFEMODE`
   2. `RouterSafemodeService#periodicInvoke()#leave` got true when checking 
`safeMode && !isSafeModeSetManually`
   3. `RouterAdminServer#enterSafeMode()` set `safeMode` and 
`isSafeModeSetManually` to `true`
   4. `RouterAdminServer#enterSafeMode()` get `true` when checking safe-mode 
   5. `RouterSafemodeService#periodicInvoke()#leave` call `leave()` to leave 
safe-mode.
   
   This RBF is not in safe-mode and `safeMode` is `false`, but 
`isSafeModeSetManually` is `true`.




> RBF: Format safe-mode related logic and fix a race 
> ---
>
> Key: HDFS-17536
> URL: https://issues.apache.org/jira/browse/HDFS-17536
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> RBF: Format safe-mode related logic and fix a race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849233#comment-17849233
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

ZanderXu commented on code in PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#discussion_r1613173537


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreDriver.java:
##
@@ -88,6 +101,13 @@ public boolean init(final Configuration config, final 
String id,
 return false;
   }
 }
+
+if (conf.getBoolean(
+RBFConfigKeys.FEDERATION_STORE_MEMBERSHIP_ASYNC_OVERRIDE,
+RBFConfigKeys.FEDERATION_STORE_MEMBERSHIP_ASYNC_OVERRIDE_DEFAULT)) {
+  executor = new ThreadPoolExecutor(2, 2, 1L, TimeUnit.MINUTES, new 
LinkedBlockingQueue<>());

Review Comment:
   You can refer HDFS-16848 to change this configuration to the number of 
threads.



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreDriver.java:
##
@@ -17,13 +17,23 @@
  */
 package org.apache.hadoop.hdfs.server.federation.store.driver;
 
+import java.io.IOException;
 import java.net.InetAddress;
+import java.util.ArrayList;
 import java.util.Collection;
+import java.util.HashMap;

Review Comment:
   unused.



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreDriver.java:
##
@@ -206,4 +231,48 @@ private String getHostname() {
 }
 return hostname;
   }
+
+  /**
+   * Try to overwrite records in commitRecords and remove records in 
deleteRecords.
+   * Should return null if async mode is used. Else return removed records.
+   * @param commitRecords records to overwrite in state store
+   * @param deleteRecords records to remove from state store
+   * @param  record class
+   * @return null if async mode is used, else removed records
+   */
+  public  List handleOverwriteAndDelete(List 
commitRecords,
+  List deleteRecords) throws IOException {
+Callable overwriteCallable =
+() -> putAll(commitRecords, true, false);
+Callable> deletionCallable = () -> 
removeMultiple(deleteRecords);
+
+if (executor != null) {
+  // In async mode, just submit and let the tasks do their work and return 
asap.
+  if (!commitRecords.isEmpty()) {
+executor.submit(overwriteCallable);
+  }
+  if (!deleteRecords.isEmpty()) {
+executor.submit(deletionCallable);
+  }
+  return null;
+} else {
+  try {
+List result = new ArrayList<>();
+if (!commitRecords.isEmpty()) {
+  overwriteCallable.call();
+}
+if (!deleteRecords.isEmpty()) {
+  Map removedRecords = deletionCallable.call();
+  for (Map.Entry entry : removedRecords.entrySet()) {
+if (entry.getValue()) {
+  result.add(entry.getKey());
+}
+  }
+}
+return result;
+  } catch (Exception e) {
+throw new IOException(e);
+  }
+}
+  }

Review Comment:
   ```
   public  List handleOverwriteAndDelete(List 
commitRecords,
 List deleteRecords) throws IOException {
   List result = null;
   try {
 // overwrite all expired records.
 if (commitRecords != null && !commitRecords.isEmpty()) {
   Callable overwriteCallable =
   () -> putAll(commitRecords, true, false);
   if (executor != null) {
 executor.submit(overwriteCallable);
   } else {
 overwriteCallable.call();
   }
 }
 
 // delete all deletable records.
 if (deleteRecords != null && !deleteRecords.isEmpty()) {
   Callable> deletionCallable = () -> 
removeMultiple(deleteRecords);
   if (executor != null) {
 executor.submit(deletionCallable);
   } else {
 result = new ArrayList<>();
 Map removedRecords = deletionCallable.call();
 for (Map.Entry entry : removedRecords.entrySet()) {
   if (entry.getValue()) {
 result.add(entry.getKey());
   }
 }
   }
 }
   } catch (Exception e) {
 throw new IOException(e);
   }
   return result;
 }
   ```





> RBF: Allow router state store cache update to overwrite and delete in parallel
> --
>
> Key: HDFS-17532
> URL: https://issues.apache.org/jira/browse/HDFS-17532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Pri

[jira] [Commented] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849231#comment-17849231
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2129075360

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 53s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   0m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 12s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/6/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 3 new + 1 
unchanged - 0 fixed = 4 total (was 1)  |
   | +1 :green_heart: |  mvnsite  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   0m 49s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/6/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf generated 1 new + 0 unchanged - 0 fixed 
= 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  19m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  30m 46s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 112m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
   |  |  Exceptional return value of 
java.util.concurrent.ThreadPoolExecutor.submit(Callable) ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult,
 boolean)  At CachedRecordStore.java:ignored in 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.overrideExpiredRecords(QueryResult,
 boolean)  At CachedRecordStore.java:[line 235] |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6839 |
   | Optional Tests | dupname

[jira] [Created] (HDFS-17536) RBF: Format safe-mode related logic and fix a race

2024-05-24 Thread ZanderXu (Jira)
ZanderXu created HDFS-17536:
---

 Summary: RBF: Format safe-mode related logic and fix a race 
 Key: HDFS-17536
 URL: https://issues.apache.org/jira/browse/HDFS-17536
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: ZanderXu
Assignee: ZanderXu


RBF: Format safe-mode related logic and fix a race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case

2024-05-24 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846211#comment-17846211
 ] 

Chenyu Zheng edited comment on HDFS-15186 at 5/24/24 8:41 AM:
--

Hi, all. I reproduce the problem of ec algorithm which is described in 
HADOOP-19180 . Would you mind taking a look at HADOOP-19180 ?


was (Author: zhengchenyu):
Hi, all. I reproduce the problem of ec algorithm which is described in 
HDFS-17521. Would you mind taking a look at HDFS-17521?

> Erasure Coding: Decommission may generate the parity block's content with all 
> 0 in some case
> 
>
> Key: HDFS-15186
> URL: https://issues.apache.org/jira/browse/HDFS-15186
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, 
> HDFS-15186.003.patch, HDFS-15186.004.patch, HDFS-15186.005.patch
>
>
> # I can find some parity block's content with all 0 when i decommission some 
> DataNode(more than 1) from a cluster. And the probability is very big(parts 
> per thousand).This is a big problem.You can think that if we read data from 
> the zero parity block or use the zero parity block to recover a block which 
> can make us use the error data even we don't know it.
> There is some case in the below:
> B: Busy DataNode, 
> D:Decommissioning DataNode,
> Others is normal.
> 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 
> In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 
> 7, 8(D)], the DN may received reconstruct block command and the 
> liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which  in 
> the class StripedReconstructionInfo) length is 2. 
> The targets's length is 2 which mean that the DataNode need recover 2 
> internal block in current code.But from the liveIndices we only can find 1 
> missing block, so the method StripedWriter#initTargetIndices will use 0 as 
> the default recover block and don't care the indices 0 is in the sources 
> indices or not.
> When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] 
> use the ec algorithm.We can find that the indices [0] is in the both the 
> sources indices and the targets indices in this case. The returned target 
> buffer in the indices [6] is always 0 from the ec  algorithm.So I think this 
> is the ec algorithm's problem. Because it should more fault tolerance.I try 
> to fixed it .But it is too hard. Because the case is too more. The second is 
> another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to 
> recover indices [0, 6, 0]). So I changed my mind.Invoke the ec  algorithm 
> with a correct parameters. Which mean that remove the duplicate target 
> indices 0 in this case.Finally, I fixed it in this way.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case

2024-05-24 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849219#comment-17849219
 ] 

Chenyu Zheng commented on HDFS-15186:
-

[~ruilaing] 
If zero block is involved in the reconstruction, it cannot be restored.
If zero block is not involved in the reconstruction, you can just delete this 
replica!
You can apply this. Then you can apply HADOOP-19180, HADOOP-19180  solved the 
problem of reconstruction fundamentally.
 

> Erasure Coding: Decommission may generate the parity block's content with all 
> 0 in some case
> 
>
> Key: HDFS-15186
> URL: https://issues.apache.org/jira/browse/HDFS-15186
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, 
> HDFS-15186.003.patch, HDFS-15186.004.patch, HDFS-15186.005.patch
>
>
> # I can find some parity block's content with all 0 when i decommission some 
> DataNode(more than 1) from a cluster. And the probability is very big(parts 
> per thousand).This is a big problem.You can think that if we read data from 
> the zero parity block or use the zero parity block to recover a block which 
> can make us use the error data even we don't know it.
> There is some case in the below:
> B: Busy DataNode, 
> D:Decommissioning DataNode,
> Others is normal.
> 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 
> In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 
> 7, 8(D)], the DN may received reconstruct block command and the 
> liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which  in 
> the class StripedReconstructionInfo) length is 2. 
> The targets's length is 2 which mean that the DataNode need recover 2 
> internal block in current code.But from the liveIndices we only can find 1 
> missing block, so the method StripedWriter#initTargetIndices will use 0 as 
> the default recover block and don't care the indices 0 is in the sources 
> indices or not.
> When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] 
> use the ec algorithm.We can find that the indices [0] is in the both the 
> sources indices and the targets indices in this case. The returned target 
> buffer in the indices [6] is always 0 from the ec  algorithm.So I think this 
> is the ec algorithm's problem. Because it should more fault tolerance.I try 
> to fixed it .But it is too hard. Because the case is too more. The second is 
> another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to 
> recover indices [0, 6, 0]). So I changed my mind.Invoke the ec  algorithm 
> with a correct parameters. Which mean that remove the duplicate target 
> indices 0 in this case.Finally, I fixed it in this way.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17535) I have confirmed the EC corrupt file, can this corrupt file be restored?

2024-05-24 Thread ruiliang (Jira)
ruiliang created HDFS-17535:
---

 Summary: I have confirmed the EC corrupt file, can this corrupt 
file be restored?
 Key: HDFS-17535
 URL: https://issues.apache.org/jira/browse/HDFS-17535
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ec, hdfs
Affects Versions: 3.1.0
Reporter: ruiliang


I learned that EC does have a major bug with file corrupt
https://issues.apache.org/jira/browse/HDFS-15759


1:I have confirmed the EC corrupt file, can this corrupt file be restored?
Have important data that is causing us production data loss issues?   Is there 
a way to recover
corrupt;/file;corrupt block groups \{blk_-xx} zeroParityBlockGroups 
\{blk_-xx[blk_-xx]}

2:https://github.com/apache/orc/issues/1939 I was wondering if cherry picked 
your current code (GitHub pull request #2869), Can I skip patches related to 
HDFS-14768,HDFS-15186, and HDFS-15240?


hdfs  version 3.1.0

thank you



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case

2024-05-24 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849217#comment-17849217
 ] 

ruiliang commented on HDFS-15186:
-

I have confirmed the EC corrupt file, can this corrupt file be restored?
Have important data that is causing us production data loss issues? Is there a 
way to recover
corrupt;/file;corrupt block groups \{blk_-xx} zeroParityBlockGroups 
\{blk_-xx[blk_-xx]}
hdfs  version 3.1.0

> Erasure Coding: Decommission may generate the parity block's content with all 
> 0 in some case
> 
>
> Key: HDFS-15186
> URL: https://issues.apache.org/jira/browse/HDFS-15186
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, 
> HDFS-15186.003.patch, HDFS-15186.004.patch, HDFS-15186.005.patch
>
>
> # I can find some parity block's content with all 0 when i decommission some 
> DataNode(more than 1) from a cluster. And the probability is very big(parts 
> per thousand).This is a big problem.You can think that if we read data from 
> the zero parity block or use the zero parity block to recover a block which 
> can make us use the error data even we don't know it.
> There is some case in the below:
> B: Busy DataNode, 
> D:Decommissioning DataNode,
> Others is normal.
> 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 
> In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 
> 7, 8(D)], the DN may received reconstruct block command and the 
> liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which  in 
> the class StripedReconstructionInfo) length is 2. 
> The targets's length is 2 which mean that the DataNode need recover 2 
> internal block in current code.But from the liveIndices we only can find 1 
> missing block, so the method StripedWriter#initTargetIndices will use 0 as 
> the default recover block and don't care the indices 0 is in the sources 
> indices or not.
> When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] 
> use the ec algorithm.We can find that the indices [0] is in the both the 
> sources indices and the targets indices in this case. The returned target 
> buffer in the indices [6] is always 0 from the ec  algorithm.So I think this 
> is the ec algorithm's problem. Because it should more fault tolerance.I try 
> to fixed it .But it is too hard. Because the case is too more. The second is 
> another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to 
> recover indices [0, 6, 0]). So I changed my mind.Invoke the ec  algorithm 
> with a correct parameters. Which mean that remove the duplicate target 
> indices 0 in this case.Finally, I fixed it in this way.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-23 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HDFS-17529.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Improve router state store cache entry deletion
> 
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849138#comment-17849138
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2128331182

   Merged. Thanks @kokonguyen191 for your contribution. 




> RBF: Improve router state store cache entry deletion
> 
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849137#comment-17849137
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu merged PR #6833:
URL: https://github.com/apache/hadoop/pull/6833




> RBF: Improve router state store cache entry deletion
> 
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Asynchronous router RPC

2024-05-23 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: Async router single ns performance test.pdf

> RBF: Asynchronous router RPC
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Async router single ns performance test.pdf, Aynchronous 
> router.pdf, HDFS-17531.001.patch, image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17518) In the lease monitor, if a file is closed, we should sync the editslog

2024-05-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848991#comment-17848991
 ] 

ASF GitHub Bot commented on HDFS-17518:
---

ThinkerLei commented on code in PR #6809:
URL: https://github.com/apache/hadoop/pull/6809#discussion_r1611850764


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java:
##
@@ -626,7 +626,8 @@ private synchronized boolean checkLeases(Collection 
leasesToCheck) {
 }
   }
   // If a lease recovery happened, we need to sync later.

Review Comment:
   @vinayakumarb Thank you for your reply. How about changing the method 
`checkLeases` to return true?





> In the lease monitor, if a file is closed, we should sync the editslog
> --
>
> Key: HDFS-17518
> URL: https://issues.apache.org/jira/browse/HDFS-17518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
>
> In the lease monitor, if a file is closed,  method checklease will return 
> true, and then the edits log will not be sync. In my opinion, we should sync 
> the edits log to avoid not synchronizing the state to the standby NameNode 
> for a long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848919#comment-17848919
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

hadoop-yetus commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2126931785

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  32m 56s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 179m 28s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6833 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 1be2770da36d 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b7aeaa91aa57e5b26bced05e23c81d21adfe1da |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/6/testReport/ |
   | Max. process+thread count | 3751 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/6/console |
   | versions | git=2.25.1 maven=3.6.3

[jira] [Commented] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848881#comment-17848881
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu commented on code in PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#discussion_r1611258872


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/CachedRecordStore.java:
##
@@ -198,8 +195,15 @@ public void overrideExpiredRecords(QueryResult query) 
throws IOException {
 if (commitRecords.size() > 0) {
   getDriver().putAll(commitRecords, true, false);
 }
-if (deleteRecords.size() > 0) {
-  newRecords.removeAll(deleteRecords);
+if (!toDeleteRecords.isEmpty()) {
+  for (Map.Entry entry : 
getDriver().removeMultiple(toDeleteRecords).entrySet()) {
+if (entry.getValue()) {
+  deletedRecords.add(entry.getKey());

Review Comment:
   Here changing to `newRecords.remove(entry.getKey())`, we can remove  
`deletedRecords`.





> RBF: Improve router state store cache entry deletion
> 
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang edited comment on HDFS-15759 at 5/23/24 4:52 AM:
--

[~weichiu]

Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
[https://github.com/apache/orc/issues/1939]
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I skip patches related to HDFS-14768,HDFS-15186, and HDFS-15240?

The current version of hdfs is 3.1.0.
Thank you!


was (Author: ruilaing):
Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I skip patches related to HDFS-14768,HDFS-15186, and HDFS-15240?

The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang edited comment on HDFS-15759 at 5/23/24 3:53 AM:
--

Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I skip patches related to HDFS-14768,HDFS-15186, and HDFS-15240?

The current version of hdfs is 3.1.0.
Thank you!


was (Author: ruilaing):
Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I not repair the patches related to HDFS-14768,HDFS-15186, and HDFS-15240?
The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang edited comment on HDFS-15759 at 5/23/24 3:52 AM:
--

Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I not repair the patches related to HDFS-14768,HDFS-15186, and HDFS-15240?
The current version of hdfs is 3.1.0.
Thank you!


was (Author: ruilaing):
Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I not repair the patches related to HDFS-14768,HDFS-15186, and HDFS-15240?
The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang edited comment on HDFS-15759 at 5/23/24 3:51 AM:
--

Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I was wondering if cherry picked your current code (GitHub pull request #2869),
Can I not repair the patches related to HDFS-14768,HDFS-15186, and HDFS-15240?
The current version of hdfs is 3.1.0.
Thank you!


was (Author: ruilaing):
Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I would like to ask if cherry picked your current code (GitHub pull request 
#2869), can you skip the code to fix HDFS-14768,HDFS-15186 and HDFS-15240 
related patches?
The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang edited comment on HDFS-15759 at 5/23/24 3:50 AM:
--

Hello, our current production data also has this kind of EC storage data damage 
problem, about the problem description
https://github.com/apache/orc/issues/1939
I would like to ask if cherry picked your current code (GitHub pull request 
#2869), can you skip the code to fix HDFS-14768,HDFS-15186 and HDFS-15240 
related patches?
The current version of hdfs is 3.1.0.
Thank you!


was (Author: ruilaing):
Hello, our current online data also appears this kind of EC storage data damage 
problem, about the problem description 
https://github.com/apache/orc/issues/1939
I would like to ask if cherry picked your current code (GitHub pull request 
#2869), can you skip the code to fix HDFS-14768,HDFS-15186 and HDFS-15240 
related patches?
The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2024-05-22 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848806#comment-17848806
 ] 

ruiliang commented on HDFS-15759:
-

Hello, our current online data also appears this kind of EC storage data damage 
problem, about the problem description 
https://github.com/apache/orc/issues/1939
I would like to ask if cherry picked your current code (GitHub pull request 
#2869), can you skip the code to fix HDFS-14768,HDFS-15186 and HDFS-15240 
related patches?
The current version of hdfs is 3.1.0.
Thank you!

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) RBF: Improve router state store cache entry deletion

2024-05-22 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N updated HDFS-17529:
---
Summary: RBF: Improve router state store cache entry deletion  (was: 
Improve router state store cache entry deletion)

> RBF: Improve router state store cache entry deletion
> 
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848796#comment-17848796
 ] 

ASF GitHub Bot commented on HDFS-17528:
---

szetszwo commented on PR #6828:
URL: https://github.com/apache/hadoop/pull/6828#issuecomment-2126096755

   @vinayakumarb , thanks a lot for reviewing this!




> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-17534) RBF: Support leader follower mode for multiple subclusters

2024-05-22 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu moved HADOOP-19183 to HDFS-17534:


Component/s: rbf
 (was: RBF)
Key: HDFS-17534  (was: HADOOP-19183)
Project: Hadoop HDFS  (was: Hadoop Common)

> RBF: Support leader follower mode for multiple subclusters
> --
>
> Key: HDFS-17534
> URL: https://issues.apache.org/jira/browse/HDFS-17534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Currently there are five modes in multiple subclusters like
> HASH, LOCAL, RANDOM, HASH_ALL,SPACE;
> Proposal a new mode called leader/follower mode. routers try to write to 
> leader subcluster as many as possible. When routers read data, put leader 
> subcluster into first rank.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848414#comment-17848414
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

hfutatzhanghb commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1609154301


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,210 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17532) RBF: Allow router state store cache update to overwrite and delete in parallel

2024-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-17532:
---
Summary: RBF: Allow router state store cache update to overwrite and delete 
in parallel  (was: Allow router state store cache update to overwrite and 
delete in parallel)

> RBF: Allow router state store cache update to overwrite and delete in parallel
> --
>
> Key: HDFS-17532
> URL: https://issues.apache.org/jira/browse/HDFS-17532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to allow the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.
> See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17518) In the lease monitor, if a file is closed, we should sync the editslog

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848367#comment-17848367
 ] 

ASF GitHub Bot commented on HDFS-17518:
---

vinayakumarb commented on code in PR #6809:
URL: https://github.com/apache/hadoop/pull/6809#discussion_r1608866841


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java:
##
@@ -626,7 +626,8 @@ private synchronized boolean checkLeases(Collection 
leasesToCheck) {
 }
   }
   // If a lease recovery happened, we need to sync later.

Review Comment:
   as mentioned above, please change below logic to call logSync() always.





> In the lease monitor, if a file is closed, we should sync the editslog
> --
>
> Key: HDFS-17518
> URL: https://issues.apache.org/jira/browse/HDFS-17518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
>
> In the lease monitor, if a file is closed,  method checklease will return 
> true, and then the edits log will not be sync. In my opinion, we should sync 
> the edits log to avoid not synchronizing the state to the standby NameNode 
> for a long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17518) In the lease monitor, if a file is closed, we should sync the editslog

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848365#comment-17848365
 ] 

ASF GitHub Bot commented on HDFS-17518:
---

vinayakumarb commented on code in PR #6809:
URL: https://github.com/apache/hadoop/pull/6809#discussion_r1608864957


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java:
##
@@ -626,7 +626,8 @@ private synchronized boolean checkLeases(Collection 
leasesToCheck) {
 }
   }
   // If a lease recovery happened, we need to sync later.

Review Comment:
   I dont think that special case needs to be handled. If there is no txn, then 
also calling logSync() wont be a problem.
   
   If there is no edit txn, logSync() will just return without doing anything.





> In the lease monitor, if a file is closed, we should sync the editslog
> --
>
> Key: HDFS-17518
> URL: https://issues.apache.org/jira/browse/HDFS-17518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
>
> In the lease monitor, if a file is closed,  method checklease will return 
> true, and then the edits log will not be sync. In my opinion, we should sync 
> the edits log to avoid not synchronizing the state to the standby NameNode 
> for a long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17530) Aynchronous router

2024-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-17530.

Resolution: Duplicate

> Aynchronous router
> --
>
> Key: HDFS-17530
> URL: https://issues.apache.org/jira/browse/HDFS-17530
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Asynchronous router RPC

2024-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-17531:
---
Summary: RBF: Asynchronous router RPC  (was: RBF: Asynchronous router RPC.)

> RBF: Asynchronous router RPC
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848113#comment-17848113
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

hadoop-yetus commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2122116959

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 18s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 29s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/4/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  33m  8s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 176m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6833 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux db3ba1bfe5a3 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 35b9915a7d90f0d824fb584c28f0b4885000130e |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/4/testReport

[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848070#comment-17848070
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

kokonguyen191 commented on code in PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#discussion_r1607729934


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreRecordOperations.java:
##
@@ -127,6 +128,17 @@  StateStoreOperationResult putAll(
   @AtMostOnce
boolean remove(T record) throws IOException;
 
+  /**
+   * Remove multiple records.
+   *
+   * @param  Record class of the records.
+   * @param records Records to be removed.
+   * @return Map of record -> boolean indicating if the record has being 
removed successfully.

Review Comment:
   Fixed





> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848061#comment-17848061
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu commented on code in PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#discussion_r1607709901


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreRecordOperations.java:
##
@@ -127,6 +128,17 @@  StateStoreOperationResult putAll(
   @AtMostOnce
boolean remove(T record) throws IOException;
 
+  /**
+   * Remove multiple records.
+   *
+   * @param  Record class of the records.
+   * @param records Records to be removed.
+   * @return Map of record -> boolean indicating if the record has being 
removed successfully.

Review Comment:
   ```
   [ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:3.0.1:javadoc-no-fork 
(default-cli) on project hadoop-hdfs-rbf: An error has occurred in Javadoc 
report generation: 
   [ERROR] Exit code: 1 - javadoc: warning - You have specified the HTML 
version as HTML 4.01 by using the -html4 option.
   [ERROR] The default is currently HTML5 and the support for HTML 4.01 will be 
removed
   [ERROR] in a future release. To suppress this warning, please ensure that 
any HTML constructs
   [ERROR] in your comments are valid in HTML5, and remove the -html4 option.
   [ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-6833/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreRecordOperations.java:136:
 error: bad use of '>'
   [ERROR]* @return Map of record -> boolean indicating any entries being 
deleted by this record.
   [ERROR] ^
   [ERROR] javadoc: warning - invalid usage of tag >
   ```
   
   @kokonguyen191 It seems that `->` is not allowed in the javadoc.





> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848047#comment-17848047
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

hadoop-yetus commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2121778835

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  33m 38s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 177m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6833 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 1ecfff136614 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 00bca37b88cdf179a429030c4b53fc2c69e2ef54 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/3/testReport

[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848038#comment-17848038
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu commented on code in PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#discussion_r1607542930


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreRecordOperations.java:
##
@@ -127,6 +128,17 @@  StateStoreOperationResult putAll(
   @AtMostOnce
boolean remove(T record) throws IOException;
 
+  /**
+   * Remove multiple records.
+   *
+   * @param  Record class of the records.
+   * @param records Records to be removed.
+   * @return Map of record -> boolean indicating any entries being deleted by 
this record.

Review Comment:
   `Map of record -> boolean indicating if the record has being removed 
successfully`



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/StateStoreRecordOperations.java:
##
@@ -152,4 +164,17 @@  StateStoreOperationResult putAll(
int remove(Class clazz, Query query)
   throws IOException;
 
+  /**
+   * Remove all records of a specific class that match any query in a list of 
queries.
+   * Requires the getAll implementation to fetch fresh records on each call.
+   *
+   * @param clazz The class to match the records with.
+   * @param queries Queries (logical OR) to filter what to remove.
+   * @param  Record class of the records.
+   * @return Map of query to number of records deleted by that query.

Review Comment:
   `Map of query to number of records removed by that query.`





> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848029#comment-17848029
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

ZanderXu commented on code in PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#discussion_r1607499422


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreBaseImpl.java:
##
@@ -86,4 +89,37 @@ public  boolean remove(T record) 
throws IOException {
 Class recordClass = (Class)StateStoreUtils.getRecordClass(clazz);
 return remove(recordClass, query) == 1;
   }
+
+  @Override
+  public  Map removeMultiple(List 
records) throws IOException {
+assert !records.isEmpty();
+// Fall back to iterative remove() calls if all records don't share 1 class
+Class expectedClazz = records.get(0).getClass();
+if (!records.stream().allMatch(x -> x.getClass() == expectedClazz)) {
+  Map result = new HashMap<>();
+  for (T record : records) {
+result.put(record, remove(record));
+  }
+  return result;
+}
+
+final List> queries = new ArrayList<>();
+for (T record: records) {
+  queries.add(new Query<>(record));
+}
+@SuppressWarnings("unchecked")
+Class recordClass = (Class) 
StateStoreUtils.getRecordClass(expectedClazz);
+Map, Integer> result = remove(recordClass, queries);
+return result.entrySet().stream()
+.collect(Collectors.toMap(e -> e.getKey().getPartial(), e -> 
e.getValue() > 0));

Review Comment:
   `remove(T record)` returns true if `remove(recordClass, query)` is 1. But 
here is `e.getValue() > 0`. So how about make them consistent? 
   
   Here, how about using `e.getValue() == 1`?



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/store/TestStateStoreMembershipState.java:
##
@@ -565,7 +568,7 @@ public void testRegistrationExpiredRaceCondition()
 // Load cache
 MembershipStore memStoreSpy = spy(membershipStore);
 DelayAnswer delayer = new DelayAnswer(LOG);
-doAnswer(delayer).when(memStoreSpy).overrideExpiredRecords(any());
+doAnswer(delayer).when(memStoreSpy).overrideExpiredRecords(any(), 
anyBoolean());

Review Comment:
   remove this `anyBoolean()`



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/driver/impl/StateStoreZooKeeperImpl.java:
##
@@ -284,51 +288,88 @@ public  StateStoreOperationResult 
putAll(
   }
 
   @Override
-  public  int remove(
-  Class clazz, Query query) throws IOException {
+  public  Map, Integer> remove(Class clazz,
+  List> queries) throws IOException {
 verifyDriverReady();
-if (query == null) {
-  return 0;
+// Track how many entries are deleted by each query
+Map, Integer> ret = new HashMap<>();
+final List trueRemoved = Collections.synchronizedList(new 
ArrayList<>());
+if (queries.isEmpty()) {
+  return ret;
 }
 
 // Read the current data
 long start = monotonicNow();
-List records = null;
+List records;
 try {
   QueryResult result = get(clazz);
   records = result.getRecords();
 } catch (IOException ex) {
   LOG.error("Cannot get existing records", ex);
   getMetrics().addFailure(monotonicNow() - start);
-  return 0;
+  return ret;
 }
 
 // Check the records to remove
 String znode = getZNodeForClass(clazz);
-List recordsToRemove = filterMultiple(query, records);
+Set recordsToRemove = new HashSet<>();
+Map, List> queryToRecords = new HashMap<>();
+for (Query query : queries) {
+  List filtered = filterMultiple(query, records);
+  queryToRecords.put(query, filtered);
+  recordsToRemove.addAll(filtered);
+}
 
 // Remove the records
-int removed = 0;
-for (T existingRecord : recordsToRemove) {
+List> callables = new ArrayList<>();
+recordsToRemove.forEach(existingRecord -> callables.add(() -> {
   LOG.info("Removing \"{}\"", existingRecord);
   try {
 String primaryKey = getPrimaryKey(existingRecord);
 String path = getNodePath(znode, primaryKey);
 if (zkManager.delete(path)) {
-  removed++;
+  trueRemoved.add(existingRecord);
 } else {
   LOG.error("Did not remove \"{}\"", existingRecord);
 }
   } catch (Exception e) {
 LOG.error("Cannot remove \"{}\"", existingRecord, e);
 getMetrics().addFailure(monotonicNow() - start);
   }
+  return null;
+}));
+try {
+  if (enableConcurrent) {
+executorService.invokeAll(callables);
+  } else {
+for (Callable callable : cal

[jira] [Updated] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17464:
--
Target Version/s: 3.5.0

> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.5.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17464:
--
Affects Version/s: 3.5.0
   (was: 3.4.0)

> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.5.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu resolved HDFS-17464.
---
Fix Version/s: 3.5.0
   Resolution: Resolved

> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848024#comment-17848024
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

haiyang1987 commented on PR #6724:
URL: https://github.com/apache/hadoop/pull/6724#issuecomment-2121550133

   Committed to trunk.
   Thanks @hfutatzhanghb  for your contributions and @ZanderXu @ayushtkn review!




> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848023#comment-17848023
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

haiyang1987 merged PR #6724:
URL: https://github.com/apache/hadoop/pull/6724




> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17531) RBF: Asynchronous router RPC.

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848017#comment-17848017
 ] 

ASF GitHub Bot commented on HDFS-17531:
---

slfan1989 commented on PR #6838:
URL: https://github.com/apache/hadoop/pull/6838#issuecomment-2121500478

   > @ayushtkn @slfan1989 hi, thanks for you replay, I sent the discussion to 
[common-...@hadoop.apache.org](mailto:common-...@hadoop.apache.org).
   
   This pr has too many changes and affects multiple modules, causing the 
compilation to time out. I have seen the discussion emails, and the usual 
discussion process may take 5-7 days. 




> RBF: Asynchronous router RPC.
> -
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* fo

[jira] [Commented] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848013#comment-17848013
 ] 

Shilun Fan commented on HDFS-17533:
---

[~simbadzina] Thank you for the feedback! The upgrade from derby 10.14.2.0 to 
10.17.1.0 was completed by us at [https://github.com/apache/hadoop/pull/6816], 
and no abnormal unit tests were found at that time. I will roll back #6816.

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848003#comment-17848003
 ] 

Simbarashe Dzinamarira edited comment on HDFS-17533 at 5/20/24 11:44 PM:
-

W.r.t to the solution, EmbeddedDriver was moved to the jerbytools jar so we 
need to declare that dependency in the pom.xml.

When I include derbytools, I get the following error.
{noformat}
[ERROR] 
/Users/sdzinama/dev/hadooptree/simbatrunk/hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/store/driver/TestStateStoreMySQL.java:[42,30]
 cannot access org.apache.derby.jdbc.EmbeddedDriver
[ERROR]   bad class file: 
/Users/sdzinama/.m2/repository/org/apache/derby/derbytools/10.17.1.0/derbytools-10.17.1.0.jar(org/apache/derby/jdbc/EmbeddedDriver.class)
[ERROR]     class file has wrong version 63.0, should be 52.0
[ERROR]     Please remove or make sure it appears in the correct subdirectory 
of the classpath.{noformat}


was (Author: simbadzina):
W.r.t to the solution, EmbeddedDriver was moved to the jerbytools jar so we 
need to declare that dependency in the pom.xml

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848006#comment-17848006
 ] 

Simbarashe Dzinamarira edited comment on HDFS-17533 at 5/20/24 11:43 PM:
-

[~slfan1989] the following PR updated derby from 10.14.2.0 to 10.17.1.0.

[https://github.com/apache/hadoop/pull/6816]

However 10.17.1.0 requires a higher java version.

 

Any recommendation on how to resolve this? I assume downgrading is not an 
option.


was (Author: simbadzina):
[~slfan1989] the following PR updated derby from 10.14.2.0 to 10.17.1.0.

[https://github.com/apache/hadoop/pull/6816]

However 10.17.1.0 requires a higher java version.

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848006#comment-17848006
 ] 

Simbarashe Dzinamarira commented on HDFS-17533:
---

[~slfan1989] the following PR updated derby from 10.14.2.0 to 10.17.1.0.

[https://github.com/apache/hadoop/pull/6816]

However 10.17.1.0 requires a higher java version.

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848003#comment-17848003
 ] 

Simbarashe Dzinamarira commented on HDFS-17533:
---

W.r.t to the solution, EmbeddedDriver was moved to the jerbytools jar so we 
need to declare that dependency in the pom.xml

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17533:
--
Description: 
In the CI runs for RBF the following two tests are failing
{noformat}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
[ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[INFO] 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
[ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true {noformat}
[https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]

 

I believe the fix is first registering the driver: 
[https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]

[https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]

  was:
In the CI runs for RBF the following two tests are failing
{noformat}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
[ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[INFO] 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
[ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true {noformat}
[https://ci

[jira] [Assigned] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira reassigned HDFS-17533:
-

Assignee: Simbarashe Dzinamarira

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
>  
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17533) RBF: Unit tests that use embedded SQL failing in CI

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17533:
--
Summary: RBF: Unit tests that use embedded SQL failing in CI  (was: RBF 
Tests that use embedded SQL failing unit tests)

> RBF: Unit tests that use embedded SQL failing in CI
> ---
>
> Key: HDFS-17533
> URL: https://issues.apache.org/jira/browse/HDFS-17533
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> In the CI runs for RBF the following two tests are failing
> {noformat}
> [ERROR] Failures: 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
> [ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
> failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:TokenStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:TokenStore;drop=true
> [INFO] 
> [ERROR] 
> org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
> [ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true
> [ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
>   java.sql.SQLException: No suitable driver found for 
> jdbc:derby:memory:StateStore;create=true
>   java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
> found for jdbc:derby:memory:StateStore;drop=true {noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]
>  
> I believe the fix is first registering the driver: 
> [https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]
>  
> [https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17533) RBF Tests that use embedded SQL failing unit tests

2024-05-20 Thread Simbarashe Dzinamarira (Jira)
Simbarashe Dzinamarira created HDFS-17533:
-

 Summary: RBF Tests that use embedded SQL failing unit tests
 Key: HDFS-17533
 URL: https://issues.apache.org/jira/browse/HDFS-17533
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Simbarashe Dzinamarira


In the CI runs for RBF the following two tests are failing
{noformat}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
[ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[INFO] 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
[ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true {noformat}
[https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]

 

I believe the fix is first registering the driver: 
[https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]

 

[https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847961#comment-17847961
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

ayushtkn commented on PR #6724:
URL: https://github.com/apache/hadoop/pull/6724#issuecomment-2121001399

   @haiyang1987 / @ZanderXu anyone hitting the merge button?




> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Asynchronous router RPC.

2024-05-20 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17531:
--
Summary: RBF: Asynchronous router RPC.  (was: RBF: Aynchronous router RPC.)

> RBF: Asynchronous router RPC.
> -
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17098) DatanodeManager does not handle null storage type properly

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847888#comment-17847888
 ] 

ASF GitHub Bot commented on HDFS-17098:
---

Hexiaoqiao opened a new pull request, #6840:
URL: https://github.com/apache/hadoop/pull/6840

   
   
   ### Description of PR
   1. From https://github.com/apache/hadoop/pull/6035 which contributed by 
@teamconfx.
   2. Fix checkstyle and try to trigger Yetus again.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   




> DatanodeManager does not handle null storage type properly
> --
>
> Key: HDFS-17098
> URL: https://issues.apache.org/jira/browse/HDFS-17098
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a {{NullPointerException}} without message when sorting datanodes in 
> {{{}NetworkTopology{}}}.
> h2. Where's the bug:
> In line 654 of {{{}DatanodeManager{}}}, the manager creates a second sorter 
> using the standard {{Comparator}} class:
> {noformat}
> Comparator comp =
>         Comparator.comparing(DatanodeInfoWithStorage::getStorageType);
> secondarySort = list -> Collections.sort(list, comp);{noformat}
> This comparator is then used in {{NetworkTopology}} as a secondary sort to 
> break ties:
> {noformat}
> if (secondarySort != null) {
>         // a secondary sort breaks the tie between nodes.
>         secondarySort.accept(nodesList);
> }{noformat}
> However, if the storage type is {{{}null{}}}, a {{NullPointerException}} 
> would be thrown since the default {{Comparator.comparing}} cannot handle 
> comparison between null values.
> h2. How to reproduce:
> (1) Set {{dfs.heartbeat.interval}} to {{{}1753310367{}}}, and 
> {{dfs.namenode.read.considerStorageType}} to {{true}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock#testAviodStaleAndSlowDatanodes}}
> h2. Stacktrace:
> {noformat}
> java.lang.NullPointerException
>     at 
> java.base/java.util.Comparator.lambda$comparing$77a9974f$1(Comparator.java:469)
>     at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>     at java.base/java.util.TimSort.sort(TimSort.java:220)
>     at java.base/java.util.Arrays.sort(Arrays.java:1515)
>     at java.base/java.util.ArrayList.sort(ArrayList.java:1750)
>     at java.base/java.util.Collections.sort(Collections.java:179)
>     at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.lambda$createSecondaryNodeSorter$0(DatanodeManager.java:654)
>     at 
> org.apache.hadoop.net.NetworkTopology.sortByDistance(NetworkTopology.java:983)
>     at 
> org.apache.hadoop.net.NetworkTopology.sortByDistanceUsingNetworkLocation(NetworkTopology.java:946)
>     at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlock(DatanodeManager.java:637)
>     at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:554)
>     at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestSortLocatedBlock.testAviodStaleAndSlowDatanodes(TestSortLocatedBlock.java:144){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment. We are 
> happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847861#comment-17847861
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

hadoop-yetus commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2120301224

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  0s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  41m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 28s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | -1 :x: |  compile  |   0m 32s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javac  |   0m 32s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  compile  |   0m 28s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  javac  |   0m 28s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   0m 29s | 
[/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf

[jira] [Commented] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847846#comment-17847846
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

hadoop-yetus commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2120145776

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  15m  9s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   2m  6s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   0m 24s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in trunk failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  compile  |   0m 24s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs-rbf in trunk failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -0 :warning: |  checkstyle  |   0m 21s | 
[/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  The patch fails to run checkstyle in hadoop-hdfs-rbf  |
   | -1 :x: |  mvnsite  |   0m 23s | 
[/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in trunk failed.  |
   | -1 :x: |  javadoc  |   0m 23s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in trunk failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javadoc  |   0m 23s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs-rbf in trunk failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  spotbugs  |   0m 23s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in trunk failed.  |
   | +1 :green_heart: |  shadedclient  |   2m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 23s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | -1 :x: |  compile  |   0m 23s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6839/1/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs-rbf in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javac  |   0m 23s | 
[/patch-compile-hadoop-hdfs

[jira] [Commented] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847835#comment-17847835
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

kokonguyen191 commented on PR #6839:
URL: https://github.com/apache/hadoop/pull/6839#issuecomment-2120093100

   @ZanderXu This is the other half split from 
https://github.com/apache/hadoop/pull/6833, can help me review if you're free, 
thanks!




> Allow router state store cache update to overwrite and delete in parallel
> -
>
> Key: HDFS-17532
> URL: https://issues.apache.org/jira/browse/HDFS-17532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to allow the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.
> See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847833#comment-17847833
 ] 

ASF GitHub Bot commented on HDFS-17532:
---

kokonguyen191 opened a new pull request, #6839:
URL: https://github.com/apache/hadoop/pull/6839

   ### Description of PR
   
   This ticket aims to allow the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.
   
   Sister ticket to HDFS-17529




> Allow router state store cache update to overwrite and delete in parallel
> -
>
> Key: HDFS-17532
> URL: https://issues.apache.org/jira/browse/HDFS-17532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to allow the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.
> See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17532:
--
Labels: pull-request-available  (was: )

> Allow router state store cache update to overwrite and delete in parallel
> -
>
> Key: HDFS-17532
> URL: https://issues.apache.org/jira/browse/HDFS-17532
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to allow the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.
> See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread Felix N (Jira)
Felix N created HDFS-17532:
--

 Summary: Allow router state store cache update to overwrite and 
delete in parallel
 Key: HDFS-17532
 URL: https://issues.apache.org/jira/browse/HDFS-17532
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, rbf
Reporter: Felix N
Assignee: Felix N


Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket aims to allow the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.

See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N updated HDFS-17529:
---
Description: 
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket aims to improve the deletion process for ZK state store 
implementation.

See HDFS-17532 for the other half of this improvement

  was:
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket aims to improve the deletion process for ZK state store 
implementation.


> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.
> See HDFS-17532 for the other half of this improvement



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N updated HDFS-17529:
---
Description: 
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket aims to improve the deletion process for ZK state store 
implementation.

  was:
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket contains 2 parts: improving the deletion process for ZK state store 
implementation, and allowing the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.


> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket aims to improve the deletion process for ZK state store 
> implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847816#comment-17847816
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

kokonguyen191 commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2120015079

   @ZanderXu Thanks for the review, I have updated the codes + changed the 
ticket/PR title for the deletion part only, will open another PR for the async 
part later. I'm a bit confused about point 3, can you elaborate a bit on that 
part?




> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) Improve router state store cache entry deletion

2024-05-20 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N updated HDFS-17529:
---
Summary: Improve router state store cache entry deletion  (was: Improve 
router state store cache update)

> Improve router state store cache entry deletion
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache update

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847756#comment-17847756
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

kokonguyen191 commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2119557407

   @ZanderXu Can you help me take a look when you're free, thanks!




> Improve router state store cache update
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache update

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847755#comment-17847755
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

kokonguyen191 commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2119556931

   The failed unit tests really look like they were related to the changes but 
they aren't. Both tests fail without the patch, and seem to have failed for 
some past MRs already.




> Improve router state store cache update
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: (was: Aynchronous router.pdf)

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: Aynchronous router.pdf

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847745#comment-17847745
 ] 

ASF GitHub Bot commented on HDFS-17531:
---

KeeProMise commented on PR #6838:
URL: https://github.com/apache/hadoop/pull/6838#issuecomment-2119506196

   @ayushtkn @slfan1989 hi, thanks for you replay, I sent the discussion to 
common-...@hadoop.apache.org.




> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847740#comment-17847740
 ] 

ASF GitHub Bot commented on HDFS-17531:
---

slfan1989 commented on PR #6838:
URL: https://github.com/apache/hadoop/pull/6838#issuecomment-2119459510

   @KeeProMise Thanks for the contribution! This pr is too large and it seems 
that it cannot be reviewed.
   
   Let’s first follow the process and discuss it on the hadoop-common mailing 
list.  We should split the PR for easier review and also provide benchmark data.




> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and di

[jira] [Commented] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847730#comment-17847730
 ] 

ASF GitHub Bot commented on HDFS-17531:
---

ayushtkn commented on PR #6838:
URL: https://github.com/apache/hadoop/pull/6838#issuecomment-2119352612

   passing by: discuss threads should be on hadoop dev mailing lists




> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: HDFS-17531.001.patch
Status: Patch Available  (was: Open)

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, HDFS-17531.001.patch, 
> image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847663#comment-17847663
 ] 

ASF GitHub Bot commented on HDFS-17531:
---

KeeProMise opened a new pull request, #6838:
URL: https://github.com/apache/hadoop/pull/6838

   
   
   ### Description of PR
   see: https://issues.apache.org/jira/browse/HDFS-17531
   
   ### How was this patch tested?
   TestNoNamenodesAvailableLongTime
   TestObserverWithRouter
   TestRouterFederationRename
   TestRouterFederationRenamePermission
   TestRouterQuota
   TestRouterRefreshSuperUserGroupsConfiguration
   TestRouterRpc
   TestRouterRpcMultiDestination
   TestRouterRPCMultipleDestinationMountTableResolver
   TestRouterRpcSingleNS
   TestRouterRpcStoragePolicySatisfier
   TestRouterUserMappings
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
> Attachments: Aynchronous router.pdf, image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 

[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17531:
--
Labels: pull-request-available  (was: )

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: Aynchronous router.pdf, image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  
> Welcome everyone to exchange and discuss!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Description: 
*Description*

Currently, the main function of the Router service is to accept client 
requests, forward the requests to the corresponding downstream ns, and then 
return the results of the downstream ns to the client. The link is as follows:

*!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
The main threads involved in the rpc link are:
{*}Read{*}: Get the client request and put it into the call queue *(1)*
{*}Handler{*}:
Extract call *(2)* from the call queue, process the call, generate a new call, 
place it in the call of the connection thread, and wait for the call processing 
to complete *(3)*
After being awakened by the connection thread, process the response and put it 
into the response queue *(5)*
*Connection:*
Hold the link with downstream ns, send the call from the call to the downstream 
ns (via {*}rpcRequestThread{*}), and obtain a response from ns. Based on the 
call in the response, notify the call to complete processing *(4)*
*Responder:*
Retrieve the response queue from the queue *(6)* and return it to the client
 

*Shortcoming*
Even if the *connection* thread can send more requests to downstream 
nameservices, since *(3)* and *(4)* are synchronous, when the *handler* thread 
adds the call to connection.calls, it needs to wait until the *connection* 
notifies the call to complete, and then Only after the response is put into the 
response queue can a new call be obtained from the call queue and processed. 
Therefore, the concurrency performance of the router is limited by the number 
of handlers; a simple example is as follows: If the number of handlers is 1 and 
the maximum number of calls in the connection thread is 10, then even if the 
connection thread can send 10 requests to the downstream ns, since the number 
of handlers is 1, the router can only process one request after another. 
 
Since the performance of router rpc is mainly limited by the number of 
handlers, the most effective way to improve rpc performance currently is to 
increase the number of handlers. Letting the router create a large number of 
handler threads will also increase the number of thread switches and cannot 
maximize the use of machine performance.
 
There are usually multiple ns downstream of the router. If the handler forwards 
the request to an ns with poor performance, it will cause the handler to wait 
for a long time. Due to the reduction of available handlers, the router's 
ability to handle ns requests with normal performance will be reduced. From the 
perspective of the client, the performance of the downstream ns of the router 
has deteriorated at this time. We often find that the call queue of the 
downstream ns is not high, but the call queue of the router is very high.
 
Therefore, although the main function of the router is to federate and handle 
requests from multiple NSs, the current synchronous RPC performance cannot 
satisfy the scenario where there are many NSs downstream of the router. Even if 
the concurrent performance of the router can be improved by increasing the 
number of handlers, it is still relatively slow. More threads will increase the 
CPU context switching time, and in fact many of the handler threads are in a 
blocked state, which is undoubtedly a waste of thread resources. When a request 
enters the router, there is no guarantee that there will be a running handler 
at this time.
 

Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
complete solution.

 

Welcome everyone to exchange and discuss!

  was:
*Description*

Currently, the main function of the Router service is to accept client 
requests, forward the requests to the corresponding downstream ns, and then 
return the results of the downstream ns to the client. The link is as follows:

*!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
The main threads involved in the rpc link are:
{*}Read{*}: Get the client request and put it into the call queue *(1)*
{*}Handler{*}:
Extract call *(2)* from the call queue, process the call, generate a new call, 
place it in the call of the connection thread, and wait for the call processing 
to complete *(3)*
After being awakened by the connection thread, process the response and put it 
into the response queue *(5)*
*Connection:*
Hold the link with downstream ns, send the call from the call to the downstream 
ns (via {*}rpcRequestThread{*}), and obtain a response from ns. Based on the 
call in the response, notify the call to complete processing *(4)*
*Responder:*
Retrieve the response queue from the queue *(6)* and return it to the client
 

*Shortcoming*
Even if the *connection* thread can send more requests to downstream 
nameservices, since *(3)* and *(4)* are synchronous, when the *handler* thread 
adds the call

[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: Aynchronous router.pdf

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
> Attachments: Aynchronous router.pdf, image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}Read{*}: Get the client request and put it into the call queue *(1)*
> {*}Handler{*}:
> Extract call *(2)* from the call queue, process the call, generate a new 
> call, place it in the call of the connection thread, and wait for the call 
> processing to complete *(3)*
> After being awakened by the connection thread, process the response and put 
> it into the response queue *(5)*
> *Connection:*
> Hold the link with downstream ns, send the call from the call to the 
> downstream ns (via {*}rpcRequestThread{*}), and obtain a response from ns. 
> Based on the call in the response, notify the call to complete processing 
> *(4)*
> *Responder:*
> Retrieve the response queue from the queue *(6)* and return it to the client
>  
> *Shortcoming*
> Even if the *connection* thread can send more requests to downstream 
> nameservices, since *(3)* and *(4)* are synchronous, when the *handler* 
> thread adds the call to connection.calls, it needs to wait until the 
> *connection* notifies the call to complete, and then Only after the response 
> is put into the response queue can a new call be obtained from the call queue 
> and processed. Therefore, the concurrency performance of the router is 
> limited by the number of handlers; a simple example is as follows: If the 
> number of handlers is 1 and the maximum number of calls in the connection 
> thread is 10, then even if the connection thread can send 10 requests to the 
> downstream ns, since the number of handlers is 1, the router can only process 
> one request after another. 
>  
> Since the performance of router rpc is mainly limited by the number of 
> handlers, the most effective way to improve rpc performance currently is to 
> increase the number of handlers. Letting the router create a large number of 
> handler threads will also increase the number of thread switches and cannot 
> maximize the use of machine performance.
>  
> There are usually multiple ns downstream of the router. If the handler 
> forwards the request to an ns with poor performance, it will cause the 
> handler to wait for a long time. Due to the reduction of available handlers, 
> the router's ability to handle ns requests with normal performance will be 
> reduced. From the perspective of the client, the performance of the 
> downstream ns of the router has deteriorated at this time. We often find that 
> the call queue of the downstream ns is not high, but the call queue of the 
> router is very high.
>  
> Therefore, although the main function of the router is to federate and handle 
> requests from multiple NSs, the current synchronous RPC performance cannot 
> satisfy the scenario where there are many NSs downstream of the router. Even 
> if the concurrent performance of the router can be improved by increasing the 
> number of handlers, it is still relatively slow. More threads will increase 
> the CPU context switching time, and in fact many of the handler threads are 
> in a blocked state, which is undoubtedly a waste of thread resources. When a 
> request enters the router, there is no guarantee that there will be a running 
> handler at this time.
>  
> Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
> complete solution.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Description: 
*Description*

Currently, the main function of the Router service is to accept client 
requests, forward the requests to the corresponding downstream ns, and then 
return the results of the downstream ns to the client. The link is as follows:

*!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
The main threads involved in the rpc link are:
{*}Read{*}: Get the client request and put it into the call queue *(1)*
{*}Handler{*}:
Extract call *(2)* from the call queue, process the call, generate a new call, 
place it in the call of the connection thread, and wait for the call processing 
to complete *(3)*
After being awakened by the connection thread, process the response and put it 
into the response queue *(5)*
*Connection:*
Hold the link with downstream ns, send the call from the call to the downstream 
ns (via {*}rpcRequestThread{*}), and obtain a response from ns. Based on the 
call in the response, notify the call to complete processing *(4)*
*Responder:*
Retrieve the response queue from the queue *(6)* and return it to the client
 

*Shortcoming*
Even if the *connection* thread can send more requests to downstream 
nameservices, since *(3)* and *(4)* are synchronous, when the *handler* thread 
adds the call to connection.calls, it needs to wait until the *connection* 
notifies the call to complete, and then Only after the response is put into the 
response queue can a new call be obtained from the call queue and processed. 
Therefore, the concurrency performance of the router is limited by the number 
of handlers; a simple example is as follows: If the number of handlers is 1 and 
the maximum number of calls in the connection thread is 10, then even if the 
connection thread can send 10 requests to the downstream ns, since the number 
of handlers is 1, the router can only process one request after another. 
 
Since the performance of router rpc is mainly limited by the number of 
handlers, the most effective way to improve rpc performance currently is to 
increase the number of handlers. Letting the router create a large number of 
handler threads will also increase the number of thread switches and cannot 
maximize the use of machine performance.
 
There are usually multiple ns downstream of the router. If the handler forwards 
the request to an ns with poor performance, it will cause the handler to wait 
for a long time. Due to the reduction of available handlers, the router's 
ability to handle ns requests with normal performance will be reduced. From the 
perspective of the client, the performance of the downstream ns of the router 
has deteriorated at this time. We often find that the call queue of the 
downstream ns is not high, but the call queue of the router is very high.
 
Therefore, although the main function of the router is to federate and handle 
requests from multiple NSs, the current synchronous RPC performance cannot 
satisfy the scenario where there are many NSs downstream of the router. Even if 
the concurrent performance of the router can be improved by increasing the 
number of handlers, it is still relatively slow. More threads will increase the 
CPU context switching time, and in fact many of the handler threads are in a 
blocked state, which is undoubtedly a waste of thread resources. When a request 
enters the router, there is no guarantee that there will be a running handler 
at this time.
 

Therefore, I consider asynchronous router rpc. Please view the *pdf* for the 
complete solution.

 

  was:
*Description*

Currently, the main function of the Router service is to accept client 
requests, forward the requests to the corresponding downstream ns, and then 
return the results of the downstream ns to the client. The link is as follows:

*!image-2024-05-19-18-07-51-282.png|width=900,height=300!*

The main threads involved in the rpc link are:
{*}read{*}: Get the client request and put it into the call queue *(1)*
**

{*}handler{*}:
Remove the call from the call queue {*}(2){*}, process the call, generate a new 
call and put it into the calls of the connection thread, and wait for the call 
to be processed *(3)*
After being awakened by the connection thread, process the response and put the 
response into the response queue *(5)*
**

{*}connection{*}:
Hold the link with the downstream ns, send the call in calls to the downstream 
ns (through rpcRequestThread), and get the response from the ns, and notify the 
call processing completion according to the callid in the response *(4)*
**

{*}responder{*}:
Take out the response in the response queue column *(6)* and return it to the 
client

 

*Shortcoming*

Even if the *connection* thread can send more requests to downstream 
nameservices, since *(3)* and *(4)* are synchronous, when the *handler* thread 
adds the call

[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Description: 
*Description*

Currently, the main function of the Router service is to accept client 
requests, forward the requests to the corresponding downstream ns, and then 
return the results of the downstream ns to the client. The link is as follows:

*!image-2024-05-19-18-07-51-282.png|width=900,height=300!*

The main threads involved in the rpc link are:
{*}read{*}: Get the client request and put it into the call queue *(1)*
**

{*}handler{*}:
Remove the call from the call queue {*}(2){*}, process the call, generate a new 
call and put it into the calls of the connection thread, and wait for the call 
to be processed *(3)*
After being awakened by the connection thread, process the response and put the 
response into the response queue *(5)*
**

{*}connection{*}:
Hold the link with the downstream ns, send the call in calls to the downstream 
ns (through rpcRequestThread), and get the response from the ns, and notify the 
call processing completion according to the callid in the response *(4)*
**

{*}responder{*}:
Take out the response in the response queue column *(6)* and return it to the 
client

 

*Shortcoming*

Even if the *connection* thread can send more requests to downstream 
nameservices, since *(3)* and *(4)* are synchronous, when the *handler* thread 
adds the call to connection.calls, it needs to wait until the *connection* 
notifies the call to complete, and then Only after the response is put into the 
response queue can a new call be obtained from the call queue and processed. 
Therefore, the concurrency performance of the router is limited by the number 
of handlers; a simple example is as follows:
 - If the number of handlers is 1 and the maximum number of calls in the 
connection thread is 10, then even if the connection thread can send 10 
requests to the downstream ns, since the number of handlers is 1, the router 
can only process one request after another. .

Since the performance of router rpc is mainly limited by the number of 
handlers, the most effective way to improve rpc performance currently is to 
increase the number of handlers. Letting the router create a large number of 
handler threads will also increase the number of thread switches and cannot 
maximize the use of the machine performance.

There are usually multiple ns downstream of the router. If the handler forwards 
the request to an ns with poor performance, it will cause the handler to wait 
for a long time. Due to the reduction of available handlers, the router's 
ability to handle ns requests with normal performance will be reduced. , from 
the perspective of the client, the performance of the downstream ns of the 
router has deteriorated at this time. We often find that the call queue of the 
downstream ns is not high, but the call queue of the router is very high.

Therefore, although the main function of the router is to federate and handle 
requests from multiple NSs, the current synchronous RPC performance cannot 
satisfy the scenario where there are many NSs downstream of the router. Even if 
the concurrent performance of the router can be improved by increasing the 
number of handlers, it is still relatively slow. More threads will increase the 
CPU context switching time, and in fact many of the handler threads are in a 
blocked state, which is undoubtedly a waste of thread resources. When a request 
enters the router, there is no guarantee that there will be a running handler 
at this time.

 

Therefore, I consider asynchronous router rpc. Please view the pdf for the 
complete solution.

 

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
> Attachments: image-2024-05-19-18-07-51-282.png
>
>
> *Description*
> Currently, the main function of the Router service is to accept client 
> requests, forward the requests to the corresponding downstream ns, and then 
> return the results of the downstream ns to the client. The link is as follows:
> *!image-2024-05-19-18-07-51-282.png|width=900,height=300!*
> The main threads involved in the rpc link are:
> {*}read{*}: Get the client request and put it into the call queue *(1)*
> **
> {*}handler{*}:
> Remove the call from the call queue {*}(2){*}, process the call, generate a 
> new call and put it into the calls of the connection thread, and wait for the 
> call to be processed *(3)*
> After being awakened by the connection thread, process the response and put 
> the response into the response 

[jira] [Updated] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17531:
--
Attachment: image-2024-05-19-18-07-51-282.png

> RBF: Aynchronous router RPC.
> 
>
> Key: HDFS-17531
> URL: https://issues.apache.org/jira/browse/HDFS-17531
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
> Attachments: image-2024-05-19-18-07-51-282.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17531) RBF: Aynchronous router RPC.

2024-05-19 Thread Jian Zhang (Jira)
Jian Zhang created HDFS-17531:
-

 Summary: RBF: Aynchronous router RPC.
 Key: HDFS-17531
 URL: https://issues.apache.org/jira/browse/HDFS-17531
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jian Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17530) Aynchronous router

2024-05-19 Thread Jian Zhang (Jira)
Jian Zhang created HDFS-17530:
-

 Summary: Aynchronous router
 Key: HDFS-17530
 URL: https://issues.apache.org/jira/browse/HDFS-17530
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jian Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-05-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847610#comment-17847610
 ] 

ASF GitHub Bot commented on HDFS-17410:
---

hfutatzhanghb commented on code in PR #6634:
URL: https://github.com/apache/hadoop/pull/6634#discussion_r1605921502


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -3654,14 +3654,15 @@ void setQuota(String src, long nsQuota, long ssQuota, 
StorageType type)
   checkSuperuserPrivilege(operationName, src);
 }
 try {
-  writeLock();
+  // Need to compute the curren space usage
+  writeLock(FSNamesystemLockMode.GLOBAL);

Review Comment:
   @ZanderXu Got it, Thanks sir.





> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * satisfyStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * getEnclosingRoot
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus
>  * setXAttr
>  * getXAttrs
>  * listXAttrs
>  * removeXAttr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-05-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847524#comment-17847524
 ] 

ASF GitHub Bot commented on HDFS-17410:
---

ZanderXu commented on code in PR #6634:
URL: https://github.com/apache/hadoop/pull/6634#discussion_r1605789539


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -3654,14 +3654,15 @@ void setQuota(String src, long nsQuota, long ssQuota, 
StorageType type)
   checkSuperuserPrivilege(operationName, src);
 }
 try {
-  writeLock();
+  // Need to compute the curren space usage
+  writeLock(FSNamesystemLockMode.GLOBAL);

Review Comment:
   `computeQuotaUsage` needs to rely on block state to get 
storagespaceConsumed, so the GLOBAL lock is used here. After HDFS-17497 is 
merged, this global lock can be replaced by the fs lock.





> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * satisfyStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * getEnclosingRoot
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus
>  * setXAttr
>  * getXAttrs
>  * listXAttrs
>  * removeXAttr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache update

2024-05-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847290#comment-17847290
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

hadoop-yetus commented on PR #6833:
URL: https://github.com/apache/hadoop/pull/6833#issuecomment-2117603479

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  33m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  33m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  30m  5s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 40s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 158m 32s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl
 |
   |   | hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6833/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6833 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 1ff0867a1e1e 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 
09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ea4e518423a537eced42b86222955195aea361f6 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402

[jira] [Commented] (HDFS-16874) Improve DataNode decommission for Erasure Coding

2024-05-17 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847246#comment-17847246
 ] 

Chenyu Zheng commented on HDFS-16874:
-

[~jingzhao] 

Do you have any new plans? I found HDFS-17515 and HDFS-17516 that may be 
relevant to this. Can I take over this and create some subtasks?

> Improve DataNode decommission for Erasure Coding
> 
>
> Key: HDFS-16874
> URL: https://issues.apache.org/jira/browse/HDFS-16874
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
>
> There are a couple of issues with the current DataNode decommission 
> implementation when large amounts of Erasure Coding data are involved in the 
> data re-replication/reconstruction process:
>  # Slowness. In HDFS-8786 we made a decision to use re-replication for 
> DataNode decommission if the internal EC block is still available. While this 
> strategy reduces the CPU cost caused by EC reconstruction, it greatly limits 
> the overall data recovery bandwidth, since there is only one single DataNode 
> as the source. While high density HDD hosts are more and more widely used by 
> HDFS especially along with Erasure Coding for warm data use case, this 
> becomes a big pain for cluster management. In our production, to decommission 
> a DataNode with several hundred TB EC data stored might take several days. 
> HDFS-16613 provides optimization based on the existing mechanism, but more 
> fundamentally we may want to allow EC reconstruction for DataNode 
> decommission so as to achieve much larger recovery bandwidth.
>  # The semantic of the existing EC reconstruction command (the 
> BlockECReconstructionInfoProto msg sent from NN to DN) is not clear. The 
> existing reconstruction command depends on the holes in the 
> srcNodes/liveBlockIndices arrays to indicate the target internal blocks for 
> recovery, while the holes can also be caused by the fact that the 
> corresponding datanode is too busy so it cannot be used as the reconstruction 
> source. This causes the later DataNode side reconstruction may not be 
> consistent with the original intention. E.g., if the index of the missing 
> block is 6, and the datanode storing block 0 is busy, the src nodes in the 
> reconstruction command only cover blocks [1, 2, 3, 4, 5, 7, 8]. The target 
> datanode may reconstruct the internal block 0 instead of 6. HDFS-16566 is 
> working on this issue by indicating an excluding index list. More 
> fundamentally we can follow the same path but go a step further by adding an 
> optional field explicitly indicating the target block indices in the command 
> protobuf msg. With the extension the DataNode will no longer use the holes in 
> the src node array to "guess" the reconstruction targets.
> Internally we have developed and applied fixes by following the above 
> directions. We have seen significant improvement (100+ times speed up) in 
> terms of datanode decommission speed for EC data. The more clear semantic of 
> the reconstruction command protobuf msg also help prevent potential data 
> corruption during the EC reconstruction.
> We will use this ticket to track the similar fixes for the Apache releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17529) Improve router state store cache update

2024-05-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847230#comment-17847230
 ] 

ASF GitHub Bot commented on HDFS-17529:
---

kokonguyen191 opened a new pull request, #6833:
URL: https://github.com/apache/hadoop/pull/6833

   ### Description of PR
   
   Current implementation for router state store update is quite inefficient, 
so much that when routers are removed and a lot of NameNodeMembership records 
are deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.
   
   This ticket contains 2 parts: improving the deletion process for ZK state 
store implementation, and allowing the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.
   
   ### How was this patch tested?
   UT
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?




> Improve router state store cache update
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) Improve router state store cache update

2024-05-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17529:
--
Labels: pull-request-available  (was: )

> Improve router state store cache update
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17529) Improve router state store cache update

2024-05-17 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N updated HDFS-17529:
---
Description: 
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket contains 2 parts: improving the deletion process for ZK state store 
implementation, and allowing the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.

  was:
Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket contains 2 parts: improving the deletion process for ZK state store 
implementation, and allowing the overwrite part and delete part of


> Improve router state store cache update
> ---
>
> Key: HDFS-17529
> URL: https://issues.apache.org/jira/browse/HDFS-17529
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, rbf
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>
> Current implementation for router state store update is quite inefficient, so 
> much that when routers are removed and a lot of NameNodeMembership records 
> are deleted in a short burst, the deletions triggered a router safemode in 
> our cluster and caused a lot of troubles.
> This ticket contains 2 parts: improving the deletion process for ZK state 
> store implementation, and allowing the overwrite part and delete part of 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
>  to run in parallel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >