[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837979#comment-17837979
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

ferhui commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1568281919


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,201 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837977#comment-17837977
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

ferhui commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1568281027


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,201 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17471) Correct the percentage of file I/O events.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838014#comment-17838014
 ] 

ASF GitHub Bot commented on HDFS-17471:
---

hadoop-yetus commented on PR #6742:
URL: https://github.com/apache/hadoop/pull/6742#issuecomment-2060605281

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m  6s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 198m 47s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 295m 10s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6742/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6742 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 5859d9b23df2 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cb17014faf1a2d66d75971584b365a0d7d60ce47 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6742/1/testReport/ |
   | Max. process+thread count | 4361 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6742/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 

[jira] [Commented] (HDFS-17457) [FGL] UTs support fine-grained locking

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838046#comment-17838046
 ] 

ASF GitHub Bot commented on HDFS-17457:
---

hadoop-yetus commented on PR #6741:
URL: https://github.com/apache/hadoop/pull/6741#issuecomment-2060740027

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 40 new or modified test files.  |
    _ HDFS-17384 Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  9s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  19m 46s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  compile  |   8m 49s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   8m  3s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   2m  8s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  mvnsite  |   2m  4s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  javadoc  |   1m 51s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   0m 54s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6741/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf in HDFS-17384 has 1 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  19m 59s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   8m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   8m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   2m  4s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6741/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 356 unchanged - 1 fixed = 357 total (was 
357)  |
   | +1 :green_heart: |  mvnsite  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 203m 14s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  unit  |  29m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6741/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  unit  |   0m 38s |  |  hadoop-fs2img in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 381m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6741/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6741 |
   | Optional Tests | 

[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837981#comment-17837981
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

ZanderXu commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1568286270


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,201 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838002#comment-17838002
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

kokonguyen191 commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1568338790


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,201 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838004#comment-17838004
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

kokonguyen191 commented on code in PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#discussion_r1568339380


##
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/NamenodeFGL.md:
##
@@ -0,0 +1,201 @@
+
+
+HDFS Namenode Fine-grained Locking
+==
+
+ [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Felix N (Jira)
Felix N created HDFS-17475:
--

 Summary: Add a command to check if files are readable
 Key: HDFS-17475
 URL: https://issues.apache.org/jira/browse/HDFS-17475
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Felix N
Assignee: Felix N
 Fix For: 3.5.0


Sometimes a job can fail due to one unreadable file down the line due to 
missing replicas or dead DNs or other reason. This command should allow users 
to check whether files are readable by checking for metadata on DNs without 
executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838153#comment-17838153
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568756254


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {

Review Comment:
   here update  `while (true)` and can remove  line[286~288], how about it ?





> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");
>   }
>   readTotal += nread;
>   bufOffset += nread;
>   bytesRemaining -= nread;
> }
> count++;
> if (count == countBeforeSleep) {
> System.out.println("sleeping for " + sleepDuration + " 
> milliseconds");
> Thread.sleep(sleepDuration);
> System.out.println("resuming");
> }
> if (count == countBeforeSleep + countAfterSleep) {
> System.out.println("done");
> break;
> }
>   }
> } catch (Exception e) {
> System.out.println("exception on read " + count + " read total " 
> + readTotal);
> throw e;
> }
> }
> }
> {code}
> The issue appears to be due to the fact that datanodes close the connection 
> of EC client if it doesn't fetch next packet for longer than 
> dfs.client.socket-timeout. The EC client doesn't retry and instead assumes 
> that those datanodes went away resulting in "missing blocks" exception.
> I was able to consistently reproduce with the following arguments:
> {noformat}
> bufLen = 100 (just below 1MB which is the size of the stripe) 
> sleepDuration = (dfs.client.socket-timeout + 1) * 1000 (in our case 11000)
> countBeforeSleep = 1
> countAfterSleep = 7
> {noformat}
> I've attached the entire log output of running the snippet above against 
> erasure coded file with RS-3-2-1024k policy. And here are the logs from 
> datanodes of disconnecting the client:
> datanode 1:
> 

[jira] [Commented] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Felix N (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838119#comment-17838119
 ] 

Felix N commented on HDFS-17475:


Hi [~ayushtkn], the requirements for this feature did indeed come from our 
production users. While fsck can check if some blocks are missing, AFAIK, a 
successful fsck doesn't guarantee all blocks being readable. This feature aims 
to provide a method to verify a large number of files are readable or not 
without going through the full read pipeline for each file.

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17473) [FGL] Make quota related operations thread-safe

2024-04-17 Thread ZanderXu (Jira)
ZanderXu created HDFS-17473:
---

 Summary: [FGL] Make quota related operations thread-safe 
 Key: HDFS-17473
 URL: https://issues.apache.org/jira/browse/HDFS-17473
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


Concurrent operations on directory tree may cause Quota updates and 
verification to not be thread-safe.

For example: 
 # Supposing there is directory _/a/b_ and quota is on inode _a_ and _b_
 # There are some directories and files under {_}/a/b{_}, such as: 
{_}/a/b/c/d1{_}, _/a/b/d/f1.txt_
 # Supposing there is a create operation under _/a/b/c/d1_ and there is a 
addBlock operation on _/a/b/d/f1.txt_
 # These two operations can be handled concurrently by namenode
 # They will update the quota on inode a concurrently since these operations 
just hold the read lock of the inode _a_ and {_}b{_}.
 # so we should make quota-related thread safe.

 

There are two solutions to make quota-related thread safe。

Solution one: Hold the write lock of the first iNode with Quota set when 
resolvePath
 * Directly hold the write lock of iNode _a_ so that all operations involving 
subtree _/a_ can be handled safety.
 * Due to lower concurrency, maximum improvements cannot be achieved.
 * But the implementation is simple and straightforward.

Solution two: Lock all QuotaFeatures during quota verification or update
 * Still hold the read lock of iNode a and b
 * Lock all QuotaFeatures involved in this operations, when validating or 
updating quotas.
 * Maximum improvements can be achieved.
 * But the implementation is a little complex
 ** Add a lock for each QuotaFeature
 ** Acquire locks for all involving QuotaFeature



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17475:
--
Labels: pull-request-available  (was: )

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838091#comment-17838091
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

Neilxzn commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2060941447

   > @Neilxzn Hi, this patch is very useful, would you mind further fixing this 
PR?
   
   Sorry for my late reply.  I have updated the patch based on the suggestions 
above. Please review it again. @haiyang1987 @zhangshuyan0 




> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");
>   }
>   readTotal += nread;
>   bufOffset += nread;
>   bytesRemaining -= nread;
> }
> count++;
> if (count == countBeforeSleep) {
> System.out.println("sleeping for " + sleepDuration + " 
> milliseconds");
> Thread.sleep(sleepDuration);
> System.out.println("resuming");
> }
> if (count == countBeforeSleep + countAfterSleep) {
> System.out.println("done");
> break;
> }
>   }
> } catch (Exception e) {
> System.out.println("exception on read " + count + " read total " 
> + readTotal);
> throw e;
> }
> }
> }
> {code}
> The issue appears to be due to the fact that datanodes close the connection 
> of EC client if it doesn't fetch next packet for longer than 
> dfs.client.socket-timeout. The EC client doesn't retry and instead assumes 
> that those datanodes went away resulting in "missing blocks" exception.
> I was able to consistently reproduce with the following arguments:
> {noformat}
> bufLen = 100 (just below 1MB which is the size of the stripe) 
> sleepDuration = (dfs.client.socket-timeout + 1) * 1000 (in our case 11000)
> countBeforeSleep = 1
> countAfterSleep = 7
> {noformat}
> I've attached the entire log output of running the snippet above against 
> erasure coded file with RS-3-2-1024k policy. And here are the logs from 
> datanodes of disconnecting the client:
> datanode 1:
> {noformat}
> 2020-06-15 19:06:20,697 INFO datanode.DataNode: Likely the client has stopped 
> reading, disconnecting it (datanode-v11-0-hadoop.hadoop:9866:DataXceiver 
> error processing READ_BLOCK operation  src: /10.128.23.40:53748 dst: 
> /10.128.14.46:9866); java.net.SocketTimeoutException: 1 millis timeout 
> while waiting for channel to be ready for write. ch : 
> java.nio.channels.SocketChannel[connected local=/10.128.14.46:9866 
> remote=/10.128.23.40:53748]
> {noformat}
> datanode 2:
> {noformat}
> 2020-06-15 19:06:20,341 INFO datanode.DataNode: Likely the client has stopped 
> reading, disconnecting it (datanode-v11-1-hadoop.hadoop:9866:DataXceiver 
> error processing READ_BLOCK operation  src: 

[jira] [Commented] (HDFS-17472) [FGL] gcDeletedSnapshot and getDelegationToken support FGL

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838056#comment-17838056
 ] 

ASF GitHub Bot commented on HDFS-17472:
---

hadoop-yetus commented on PR #6743:
URL: https://github.com/apache/hadoop/pull/6743#issuecomment-2060784984

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  11m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ HDFS-17384 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m  5s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  shadedclient  |  35m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 228m 40s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 380m 46s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6743/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6743 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8d4af09480ea 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | HDFS-17384 / 7f82bcf7a0ba76485b6125df7cd40a73f42c1f37 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6743/1/testReport/ |
   | Max. process+thread count | 3511 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6743/1/console |
   | versions | git=2.25.1 maven=3.6.3 

[jira] [Commented] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838075#comment-17838075
 ] 

ASF GitHub Bot commented on HDFS-17475:
---

kokonguyen191 opened a new pull request, #6745:
URL: https://github.com/apache/hadoop/pull/6745

   ### Description of PR
   
   Sometimes a job can fail due to one unreadable file down the line due to 
missing replicas or dead DNs or other reason. This command allows users to 
check whether files are readable by checking for metadata on DNs without 
executing full read pipelines of the files.
   
   ### How was this patch tested?
   
   Unit tests, local deployment, production. Also tested for performance.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   




> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
> Fix For: 3.5.0
>
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838145#comment-17838145
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748150


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  offsetInBlock, targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);
+continue;
+  }
 }
-length += ret;
+DFSClient.LOG.warn("Exception while reading from "

Review Comment:
   Here also can use to `warn("{}", arg)` format?





> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");

[jira] [Commented] (HDFS-17459) [FGL] Summarize this feature

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838066#comment-17838066
 ] 

ASF GitHub Bot commented on HDFS-17459:
---

hadoop-yetus commented on PR #6737:
URL: https://github.com/apache/hadoop/pull/6737#issuecomment-2060836090

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  6s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
    _ HDFS-17384 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  52m 21s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  shadedclient  |  94m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 17s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 40s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 144m 46s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6737/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6737 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint |
   | uname | Linux 9b370944cfc2 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | HDFS-17384 / 1ac8f4f7e1a76dc31492f3b27e8fb3f61b3b992f |
   | Max. process+thread count | 527 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6737/3/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> [FGL] Summarize this feature 
> -
>
> Key: HDFS-17459
> URL: https://issues.apache.org/jira/browse/HDFS-17459
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
> Write a doc to summarize this feature so we can merge it into the trunk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838082#comment-17838082
 ] 

Ayush Saxena commented on HDFS-17475:
-

I am not sure this is something required, We have Fsck which does a lot of 
stuff, & as I see there is no production utility of this & is just for 
debugging purpose. So, I think Fsck or for debug you can just get the file 
only...

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17475:

Fix Version/s: (was: 3.5.0)

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17469) Audit log for reportBadBlocks RPC

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838064#comment-17838064
 ] 

ASF GitHub Bot commented on HDFS-17469:
---

hadoop-yetus commented on PR #6731:
URL: https://github.com/apache/hadoop/pull/6731#issuecomment-2060830075

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 204m 16s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 301m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6731/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6731 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8781f77f186c 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 17b822eb25e449b13a908eebf6c7a15628356b8c |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6731/2/testReport/ |
   | Max. process+thread count | 4630 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6731/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Audit log for reportBadBlocks RPC
> 

[jira] [Commented] (HDFS-17367) Add PercentUsed for Different StorageTypes in JMX

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838104#comment-17838104
 ] 

ASF GitHub Bot commented on HDFS-17367:
---

hadoop-yetus commented on PR #6735:
URL: https://github.com/apache/hadoop/pull/6735#issuecomment-2060990212

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 45s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m  2s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 46s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 231m 12s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 13s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 484m 35s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6735/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6735 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | Linux 5e5a43553ecc 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9cc6066b5881092a26c8fa70472b08b1a00d76c6 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6735/3/testReport/ |
   | Max. process+thread count | 4025 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common 

[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838143#comment-17838143
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568745996


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "
++ currentBlock + " from " + currentNode
++ " at " + ce.getPos());
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+// we want to remember which block replicas we have tried
+corruptedBlocks.addCorruptedBlock(currentBlock.getBlock(), 
currentNode);
+throw ce;
+  } catch (IOException e) {
+//Clear buffer to make next decode success
+strategy.getReadBuffer().clear();
+if (curAttempts < readDNMaxAttempts) {
+  if (readerInfos[chunkIndex].reader != null) {
+readerInfos[chunkIndex].reader.close();
+  }
+  if (dfsStripedInputStream.createBlockReader(currentBlock,
+  offsetInBlock, targetBlocks,
+  readerInfos, chunkIndex, readTo)) {
+blockReader = readerInfos[chunkIndex].reader;
+String msg = "Reconnect to " + currentNode.getInfoAddr()
++ " for block " + currentBlock.getBlock();
+DFSClient.LOG.warn(msg);

Review Comment:
   Can use the 
   ```
   DFSClient.LOG.warn("Reconnect to {} for block {}", currentNode.getInfoAddr(),
   currentBlock.getBlock());
   ```





> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");
>   }
>  

[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838146#comment-17838146
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

haiyang1987 commented on code in PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#discussion_r1568748476


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java:
##
@@ -233,41 +235,63 @@ private ByteBufferStrategy[] 
getReadStrategies(StripingChunk chunk) {
 
   private int readToBuffer(BlockReader blockReader,
   DatanodeInfo currentNode, ByteBufferStrategy strategy,
-  ExtendedBlock currentBlock) throws IOException {
+  LocatedBlock currentBlock, int chunkIndex, long offsetInBlock)
+  throws IOException {
 final int targetLength = strategy.getTargetLength();
-int length = 0;
-try {
-  while (length < targetLength) {
-int ret = strategy.readFromBlock(blockReader);
-if (ret < 0) {
-  throw new IOException("Unexpected EOS from the reader");
+int curAttempts = 0;
+while (curAttempts < readDNMaxAttempts) {
+  curAttempts++;
+  int length = 0;
+  try {
+while (length < targetLength) {
+  int ret = strategy.readFromBlock(blockReader);
+  if (ret < 0) {
+throw new IOException("Unexpected EOS from the reader");
+  }
+  length += ret;
+}
+return length;
+  } catch (ChecksumException ce) {
+DFSClient.LOG.warn("Found Checksum error for "

Review Comment:
   here also





> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");
>   }
>   readTotal += nread;
>   bufOffset += nread;
>   bytesRemaining -= nread;
> }
> count++;
> if (count == countBeforeSleep) {
> System.out.println("sleeping for " + sleepDuration + " 
> milliseconds");
> Thread.sleep(sleepDuration);
> System.out.println("resuming");
> }
> if (count == countBeforeSleep + countAfterSleep) {
> System.out.println("done");
> break;
> }
>   }
> } catch (Exception e) {
> System.out.println("exception on read " + count + " read total " 
> + readTotal);
> throw e;
> }
> }
> }
> {code}
> The issue appears to be due to the fact that datanodes close the connection 
> of EC client if it doesn't fetch next packet for longer than 
> dfs.client.socket-timeout. The EC client doesn't retry and instead assumes 
> that those datanodes went away resulting in "missing blocks" exception.
> I was able to consistently reproduce with the following arguments:
> {noformat}
> bufLen = 100 (just below 1MB which is the size of the 

[jira] [Commented] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838256#comment-17838256
 ] 

ASF GitHub Bot commented on HDFS-17454:
---

hadoop-yetus commented on PR #6709:
URL: https://github.com/apache/hadoop/pull/6709#issuecomment-2061544322

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 56s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/8/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 67 unchanged - 
0 fixed = 69 total (was 67)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  42m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 264m 20s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 48s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/8/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 422m 36s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6709 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c4383051ab8b 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bf31b0654b294a3987fdfa7f014dcf7d8bde15ea |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838261#comment-17838261
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

hadoop-yetus commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2061596554

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 56s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   2m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  21m 32s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   2m 44s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 34s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 45 unchanged - 0 fixed = 
46 total (was 45)  |
   | +1 :green_heart: |  mvnsite  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 54s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 209m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 318m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithTimeout |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5829/8/artifact/out/Dockerfile
 |
   | GITHUB PR | 

[jira] [Commented] (HDFS-17367) Add PercentUsed for Different StorageTypes in JMX

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838259#comment-17838259
 ] 

ASF GitHub Bot commented on HDFS-17367:
---

hadoop-yetus commented on PR #6735:
URL: https://github.com/apache/hadoop/pull/6735#issuecomment-2061571160

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 38s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |  16m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |  16m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |  16m 15s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 15s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   2m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   6m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  3s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 288m 44s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 11s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 559m  8s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6735/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6735 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | Linux 58e6c00c723a 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 71c5c8543faf6f7acac12e23f9d9b1d86c8d44aa |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6735/4/testReport/ |
   | Max. process+thread count | 3152 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common 

[jira] [Updated] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17476:
--
Attachment: HDFS-17476.patch
Status: Patch Available  (was: Open)

> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
> Attachments: HDFS-17476.patch
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838301#comment-17838301
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

KeeProMise opened a new pull request, #6747:
URL: https://github.com/apache/hadoop/pull/6747

   
   
   
   ### Description of PR
   
   seeAlse : https://issues.apache.org/jira/browse/HDFS-17476
   In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is 
a small negative number, clientStateId-serverStateId may be greater than 
   
   (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
 * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
 * ESTIMATED_SERVER_TIME_MULTIPLIER),
   
   resulting in false positives that Observer Node is too far behind.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
> Attachments: HDFS-17476.patch
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17476:
--
Labels: pull-request-available  (was: )

> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread Jian Zhang (Jira)
Jian Zhang created HDFS-17476:
-

 Summary: fix: False positive "Observer Node is too far behind" due 
to long overflow.
 Key: HDFS-17476
 URL: https://issues.apache.org/jira/browse/HDFS-17476
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jian Zhang
Assignee: Jian Zhang


In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
small negative number, clientStateId-serverStateId may be greater than 

(ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
                  * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
                  * ESTIMATED_SERVER_TIME_MULTIPLIER),

resulting in false positives that Observer Node is too far behind.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838309#comment-17838309
 ] 

ASF GitHub Bot commented on HDFS-17475:
---

hadoop-yetus commented on PR #6745:
URL: https://github.com/apache/hadoop/pull/6745#issuecomment-2061806363

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 34s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  8s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6745/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 7 new + 13 unchanged - 
0 fixed = 20 total (was 13)  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | -1 :x: |  spotbugs  |   3m 44s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6745/1/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 4 new + 0 unchanged - 0 fixed = 4 
total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  39m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 285m 17s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6745/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 442m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  Found reliance on default encoding in 
org.apache.hadoop.hdfs.tools.DebugAdmin$VerifyReadableCommand.handleArgs(String,
 String, String, String):in 
org.apache.hadoop.hdfs.tools.DebugAdmin$VerifyReadableCommand.handleArgs(String,
 String, String, String): new java.io.InputStreamReader(InputStream)  At 
DebugAdmin.java:[line 735] |
   |  |  Found reliance on default encoding in 
org.apache.hadoop.hdfs.tools.DebugAdmin$VerifyReadableCommand.handleArgs(String,
 String, String, String):in 
org.apache.hadoop.hdfs.tools.DebugAdmin$VerifyReadableCommand.handleArgs(String,
 String, String, String): new java.io.OutputStreamWriter(OutputStream)  At 

[jira] [Created] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-17 Thread Danny Becker (Jira)
Danny Becker created HDFS-17477:
---

 Summary: IncrementalBlockReport race condition additional edge 
cases
 Key: HDFS-17477
 URL: https://issues.apache.org/jira/browse/HDFS-17477
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover, ha, namenode
Affects Versions: 3.3.6, 3.3.4, 3.3.5
Reporter: Danny Becker


HDFS-17453 fixes a race condition between IncrementalBlockReports (IBR) and the 
Edit Log Tailer which can cause the Standby NameNode (SNN) to incorrectly mark 
blocks as corrupt when it transitions to Active. There are a few edge cases 
that HDFS-17453 does not cover.

For Example:
1. SNN1 loads the edits for b1gs1 and b1gs2.
2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
4. SNN1 transitions to Active (ANN1).
5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as corrupt 
because it was still in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-17 Thread Danny Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Becker reassigned HDFS-17477:
---

Assignee: Danny Becker

> IncrementalBlockReport race condition additional edge cases
> ---
>
> Key: HDFS-17477
> URL: https://issues.apache.org/jira/browse/HDFS-17477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover, ha, namenode
>Affects Versions: 3.3.5, 3.3.4, 3.3.6
>Reporter: Danny Becker
>Assignee: Danny Becker
>Priority: Major
>
> HDFS-17453 fixes a race condition between IncrementalBlockReports (IBR) and 
> the Edit Log Tailer which can cause the Standby NameNode (SNN) to incorrectly 
> mark blocks as corrupt when it transitions to Active. There are a few edge 
> cases that HDFS-17453 does not cover.
> For Example:
> 1. SNN1 loads the edits for b1gs1 and b1gs2.
> 2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
> 3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
> 4. SNN1 transitions to Active (ANN1).
> 5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as 
> corrupt because it was still in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838401#comment-17838401
 ] 

ASF GitHub Bot commented on HDFS-17477:
---

dannytbecker opened a new pull request, #6748:
URL: https://github.com/apache/hadoop/pull/6748

   
   
   ### Description of PR
    Summary
   [HDFS-17453](https://issues.apache.org/jira/browse/HDFS-17453) fixes a race 
condition between IncrementalBlockReports (IBR) and the Edit Log Tailer which 
can cause the Standby NameNode (SNN) to incorrectly mark blocks as corrupt when 
it transitions to Active. There are a few edge cases that 
[HDFS-17453](https://issues.apache.org/jira/browse/HDFS-17453) does not cover.
   
   For Example:
   1. SNN1 loads the edits for b1gs1 and b1gs2.
   2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
   3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
   4. SNN1 transitions to Active (ANN1).
   5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as 
corrupt because it was still in the queue.
   
    Changes
   Processing a block from a DN-block pair should always remove any queued 
messages from the pendingDNMessage queue. This prevents older IBRs from being 
leaked and causing corrupt blocks when the standby NN becomes active.
   
   **Before**:
   - Process IBR
 - If the reported block's genstamp is not future or past, then update the 
blocks map
 - If the reported block's genstamp is from the future or the past, then 
keep only the latest IBR in the pendingDNMessage queue.
   
   **After**:
   - Process IBR
 - Remove the all queued messages from the reported block-DN pair from the 
pendingDNMessage queue.
 - If the reported block's genstamp is not future or past, then update the 
blocks map.
 - If the reported block's genstamp is from the future or the past then 
queue it.
   
   ### How was this patch tested?
   Added unit tests and updated unit tests added in 
[HDFS-17453](https://issues.apache.org/jira/browse/HDFS-17453)
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> IncrementalBlockReport race condition additional edge cases
> ---
>
> Key: HDFS-17477
> URL: https://issues.apache.org/jira/browse/HDFS-17477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover, ha, namenode
>Affects Versions: 3.3.5, 3.3.4, 3.3.6
>Reporter: Danny Becker
>Assignee: Danny Becker
>Priority: Major
>
> HDFS-17453 fixes a race condition between IncrementalBlockReports (IBR) and 
> the Edit Log Tailer which can cause the Standby NameNode (SNN) to incorrectly 
> mark blocks as corrupt when it transitions to Active. There are a few edge 
> cases that HDFS-17453 does not cover.
> For Example:
> 1. SNN1 loads the edits for b1gs1 and b1gs2.
> 2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
> 3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
> 4. SNN1 transitions to Active (ANN1).
> 5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as 
> corrupt because it was still in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17477:
--
Labels: pull-request-available  (was: )

> IncrementalBlockReport race condition additional edge cases
> ---
>
> Key: HDFS-17477
> URL: https://issues.apache.org/jira/browse/HDFS-17477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover, ha, namenode
>Affects Versions: 3.3.5, 3.3.4, 3.3.6
>Reporter: Danny Becker
>Assignee: Danny Becker
>Priority: Major
>  Labels: pull-request-available
>
> HDFS-17453 fixes a race condition between IncrementalBlockReports (IBR) and 
> the Edit Log Tailer which can cause the Standby NameNode (SNN) to incorrectly 
> mark blocks as corrupt when it transitions to Active. There are a few edge 
> cases that HDFS-17453 does not cover.
> For Example:
> 1. SNN1 loads the edits for b1gs1 and b1gs2.
> 2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
> 3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
> 4. SNN1 transitions to Active (ANN1).
> 5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as 
> corrupt because it was still in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread Madhan Neethiraj (Jira)
Madhan Neethiraj created HDFS-17478:
---

 Summary: FSPermissionChecker to avoid obtaining a new 
AccessControlEnforcer instance before each authz call
 Key: HDFS-17478
 URL: https://issues.apache.org/jira/browse/HDFS-17478
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namanode
Reporter: Madhan Neethiraj


An instance of AccessControlEnforcer is obtained from the registered 
INodeAttributeProvider before every call made to authorizer. This can be 
avoided by initializing the AccessControlEnforcer instance during construction 
of FsPermissionChecker and using it in every subsequent call to the authorizer. 
This will eliminate the unnecessary overhead in highly performance sensitive 
authz code path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-17478:
--

Assignee: Madhan Neethiraj

> FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance 
> before each authz call
> --
>
> Key: HDFS-17478
> URL: https://issues.apache.org/jira/browse/HDFS-17478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Attachments: HDFS-17478.patch
>
>
> An instance of AccessControlEnforcer is obtained from the registered 
> INodeAttributeProvider before every call made to authorizer. This can be 
> avoided by initializing the AccessControlEnforcer instance during 
> construction of FsPermissionChecker and using it in every subsequent call to 
> the authorizer. This will eliminate the unnecessary overhead in highly 
> performance sensitive authz code path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread Madhan Neethiraj (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated HDFS-17478:

Description: 
An instance of AccessControlEnforcer is obtained from the registered 
INodeAttributeProvider before every call made to authorizer. This can be 
avoided by initializing the AccessControlEnforcer instance during construction 
of FsPermissionChecker and using it in every subsequent call to the authorizer. 
This will eliminate the unnecessary overhead in highly performance sensitive 
authz code path.

 

CC: [~abhay], [~arp], [~swagle]

  was:An instance of AccessControlEnforcer is obtained from the registered 
INodeAttributeProvider before every call made to authorizer. This can be 
avoided by initializing the AccessControlEnforcer instance during construction 
of FsPermissionChecker and using it in every subsequent call to the authorizer. 
This will eliminate the unnecessary overhead in highly performance sensitive 
authz code path.


> FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance 
> before each authz call
> --
>
> Key: HDFS-17478
> URL: https://issues.apache.org/jira/browse/HDFS-17478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Attachments: HDFS-17478.patch
>
>
> An instance of AccessControlEnforcer is obtained from the registered 
> INodeAttributeProvider before every call made to authorizer. This can be 
> avoided by initializing the AccessControlEnforcer instance during 
> construction of FsPermissionChecker and using it in every subsequent call to 
> the authorizer. This will eliminate the unnecessary overhead in highly 
> performance sensitive authz code path.
>  
> CC: [~abhay], [~arp], [~swagle]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838378#comment-17838378
 ] 

ASF GitHub Bot commented on HDFS-17454:
---

hadoop-yetus commented on PR #6709:
URL: https://github.com/apache/hadoop/pull/6709#issuecomment-2062280747

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  49m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  41m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/9/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 67 unchanged - 
0 fixed = 69 total (was 67)  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 160m  6s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 11s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 318m 33s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfierWithStripedFile |
   |   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   |   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
   |   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6709 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 678b1c9a7002 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh 

[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838386#comment-17838386
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

hadoop-yetus commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2062518278

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 206m 50s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6747/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 296m 59s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6747/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6747 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 65c6ad4a45da 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4208a984b11713148b6b6eba9f898d3d16c41fdd |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6747/1/testReport/ |
   | Max. process+thread count | 4062 (vs. ulimit of 

[jira] [Updated] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread Madhan Neethiraj (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated HDFS-17478:

Attachment: HDFS-17478.patch
Status: Patch Available  (was: Open)

> FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance 
> before each authz call
> --
>
> Key: HDFS-17478
> URL: https://issues.apache.org/jira/browse/HDFS-17478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Madhan Neethiraj
>Priority: Major
> Attachments: HDFS-17478.patch
>
>
> An instance of AccessControlEnforcer is obtained from the registered 
> INodeAttributeProvider before every call made to authorizer. This can be 
> avoided by initializing the AccessControlEnforcer instance during 
> construction of FsPermissionChecker and using it in every subsequent call to 
> the authorizer. This will eliminate the unnecessary overhead in highly 
> performance sensitive authz code path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15413) DFSStripedInputStream throws exception when datanodes close idle connections

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838419#comment-17838419
 ] 

ASF GitHub Bot commented on HDFS-15413:
---

haiyang1987 commented on PR #5829:
URL: https://github.com/apache/hadoop/pull/5829#issuecomment-2062844297

   The UT `hadoop.hdfs.TestDFSStripedInputStreamWithTimeout ` run failed.
   




> DFSStripedInputStream throws exception when datanodes close idle connections
> 
>
> Key: HDFS-15413
> URL: https://issues.apache.org/jira/browse/HDFS-15413
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding, hdfs-client
>Affects Versions: 3.1.3
> Environment: - Hadoop 3.1.3
> - erasure coding with ISA-L and RS-3-2-1024k scheme
> - running in kubernetes
> - dfs.client.socket-timeout = 1
> - dfs.datanode.socket.write.timeout = 1
>Reporter: Andrey Elenskiy
>Priority: Critical
>  Labels: pull-request-available
> Attachments: out.log
>
>
> We've run into an issue with compactions failing in HBase when erasure coding 
> is enabled on a table directory. After digging further I was able to narrow 
> it down to a seek + read logic and able to reproduce the issue with hdfs 
> client only:
> {code:java}
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FSDataInputStream;
> public class ReaderRaw {
> public static void main(final String[] args) throws Exception {
> Path p = new Path(args[0]);
> int bufLen = Integer.parseInt(args[1]);
> int sleepDuration = Integer.parseInt(args[2]);
> int countBeforeSleep = Integer.parseInt(args[3]);
> int countAfterSleep = Integer.parseInt(args[4]);
> Configuration conf = new Configuration();
> FSDataInputStream istream = FileSystem.get(conf).open(p);
> byte[] buf = new byte[bufLen];
> int readTotal = 0;
> int count = 0;
> try {
>   while (true) {
> istream.seek(readTotal);
> int bytesRemaining = bufLen;
> int bufOffset = 0;
> while (bytesRemaining > 0) {
>   int nread = istream.read(buf, 0, bufLen);
>   if (nread < 0) {
>   throw new Exception("nread is less than zero");
>   }
>   readTotal += nread;
>   bufOffset += nread;
>   bytesRemaining -= nread;
> }
> count++;
> if (count == countBeforeSleep) {
> System.out.println("sleeping for " + sleepDuration + " 
> milliseconds");
> Thread.sleep(sleepDuration);
> System.out.println("resuming");
> }
> if (count == countBeforeSleep + countAfterSleep) {
> System.out.println("done");
> break;
> }
>   }
> } catch (Exception e) {
> System.out.println("exception on read " + count + " read total " 
> + readTotal);
> throw e;
> }
> }
> }
> {code}
> The issue appears to be due to the fact that datanodes close the connection 
> of EC client if it doesn't fetch next packet for longer than 
> dfs.client.socket-timeout. The EC client doesn't retry and instead assumes 
> that those datanodes went away resulting in "missing blocks" exception.
> I was able to consistently reproduce with the following arguments:
> {noformat}
> bufLen = 100 (just below 1MB which is the size of the stripe) 
> sleepDuration = (dfs.client.socket-timeout + 1) * 1000 (in our case 11000)
> countBeforeSleep = 1
> countAfterSleep = 7
> {noformat}
> I've attached the entire log output of running the snippet above against 
> erasure coded file with RS-3-2-1024k policy. And here are the logs from 
> datanodes of disconnecting the client:
> datanode 1:
> {noformat}
> 2020-06-15 19:06:20,697 INFO datanode.DataNode: Likely the client has stopped 
> reading, disconnecting it (datanode-v11-0-hadoop.hadoop:9866:DataXceiver 
> error processing READ_BLOCK operation  src: /10.128.23.40:53748 dst: 
> /10.128.14.46:9866); java.net.SocketTimeoutException: 1 millis timeout 
> while waiting for channel to be ready for write. ch : 
> java.nio.channels.SocketChannel[connected local=/10.128.14.46:9866 
> remote=/10.128.23.40:53748]
> {noformat}
> datanode 2:
> {noformat}
> 2020-06-15 19:06:20,341 INFO datanode.DataNode: Likely the client has stopped 
> reading, disconnecting it (datanode-v11-1-hadoop.hadoop:9866:DataXceiver 
> error processing READ_BLOCK operation  src: /10.128.23.40:48772 dst: 
> /10.128.9.42:9866); java.net.SocketTimeoutException: 1 millis timeout 
> while waiting for channel to be ready for 

[jira] [Commented] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838426#comment-17838426
 ] 

ASF GitHub Bot commented on HDFS-17478:
---

mneethiraj opened a new pull request, #6749:
URL: https://github.com/apache/hadoop/pull/6749

   ### Description of PR
   Updated FsPermissionChecker to initialize accessControlEnforcer with 
INodeAttributeProvider.getExternalAccessControlEnforcer() and use this instance 
to authorize accesses, instead of calling 
INodeAttributeProvider.getExternalAccessControlEnforcer() for every 
authorization.
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?




> FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance 
> before each authz call
> --
>
> Key: HDFS-17478
> URL: https://issues.apache.org/jira/browse/HDFS-17478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
> Attachments: HDFS-17478.patch
>
>
> An instance of AccessControlEnforcer is obtained from the registered 
> INodeAttributeProvider before every call made to authorizer. This can be 
> avoided by initializing the AccessControlEnforcer instance during 
> construction of FsPermissionChecker and using it in every subsequent call to 
> the authorizer. This will eliminate the unnecessary overhead in highly 
> performance sensitive authz code path.
>  
> CC: [~abhay], [~arp], [~swagle]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17457) [FGL] UTs support fine-grained locking

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838433#comment-17838433
 ] 

ASF GitHub Bot commented on HDFS-17457:
---

ferhui commented on PR #6741:
URL: https://github.com/apache/hadoop/pull/6741#issuecomment-2062890927

   BTW, can also check the checkstyle issue.




> [FGL] UTs support fine-grained locking
> --
>
> Key: HDFS-17457
> URL: https://issues.apache.org/jira/browse/HDFS-17457
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> [FGL] UTs support fine-grained locking



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17457) [FGL] UTs support fine-grained locking

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838432#comment-17838432
 ] 

ASF GitHub Bot commented on HDFS-17457:
---

ferhui commented on code in PR #6741:
URL: https://github.com/apache/hadoop/pull/6741#discussion_r1569812780


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestNameNodePrunesMissingStorages.java:
##
@@ -22,6 +22,7 @@
 import java.util.ArrayList;
 import java.util.Collection;
 
+import org.apache.hadoop.hdfs.server.namenode.fgl.FSNamesystemLockMode;

Review Comment:
   And here.



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java:
##
@@ -582,13 +583,13 @@ private void banner(String string) {
   }
 
   private void doMetasave(NameNode nn2) {
-nn2.getNamesystem().writeLock();
+nn2.getNamesystem().writeLock(FSNamesystemLockMode.GLOBAL);

Review Comment:
   seem BM lock here?



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAddBlockRetry.java:
##
@@ -24,6 +24,8 @@
 import static org.junit.Assert.assertTrue;
 import java.io.IOException;
 import java.util.EnumSet;
+
+import org.apache.hadoop.hdfs.server.namenode.fgl.FSNamesystemLockMode;

Review Comment:
   Can more it near other hdfs packages(org.apache.hadoop.hdfs.server.xx)?



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeleteRace.java:
##
@@ -29,6 +29,7 @@
 import java.util.concurrent.Semaphore;
 
 import org.apache.hadoop.fs.Options;
+import org.apache.hadoop.hdfs.server.namenode.fgl.FSNamesystemLockMode;

Review Comment:
   here



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java:
##
@@ -41,6 +41,8 @@
 import java.util.concurrent.atomic.AtomicReference;
 
 import java.util.function.Supplier;
+
+import org.apache.hadoop.hdfs.server.namenode.fgl.FSNamesystemLockMode;

Review Comment:
   and here





> [FGL] UTs support fine-grained locking
> --
>
> Key: HDFS-17457
> URL: https://issues.apache.org/jira/browse/HDFS-17457
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> [FGL] UTs support fine-grained locking



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838436#comment-17838436
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

KeeProMise commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2062902143

   I have run multiple ut checks locally that failed, and they all passed.




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17479) [FGL] Snapshot related operations still use global lock

2024-04-17 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu reassigned HDFS-17479:
---

Assignee: ZanderXu

> [FGL] Snapshot related operations still use global lock
> ---
>
> Key: HDFS-17479
> URL: https://issues.apache.org/jira/browse/HDFS-17479
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
> Attachments: image-2024-04-18-11-00-02-011.png, 
> image-2024-04-18-11-00-12-451.png
>
>
> Snapshot feature is a very useful feature in certain scenarios. As far as I 
> know, very few companies use this feature on the prod environment. The 
> implementation is complex and it is difficult to support FGL with only a 
> minor modifications.
> So we can still use the Global lock to make snapshot-related operations 
> thread-safe.
>  
> Snapshot has some access modules, let's analyze them and find a way to still 
> use GlobalLock.
> !image-2024-04-18-11-00-12-451.png|width=288,height=219!
> The above picture shows a simple case, we can access the iNode foo through 
> the following paths:
>  # /abc/foo
>  # /abc/.snapshot/s1/foo
> If we want to delete the iNode foo, we need to lock /abc and 
> /abc/.snapshot/s1 (DirectoryWithSnapshotFeature on iNode abc).
> If we want to change permission of the iNode foo, we need to lock /abc/foo 
> and /abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)
>  
> For this case, we can directly acquire the global lock when resolving the 
> IIPs for the input path if there is an iNode that has 
> DirectorySnapshottableFeature.
> !image-2024-04-18-11-00-02-011.png|width=368,height=383!
> After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
> the above picture shows. We can access this bar through the following path:
>  # /abc/.snapshot/s1/bar
>  # /xyz/bar
> For /abc/.snapshot/s1/bar, since the iNode abc has 
> DirectorySnapshottableFeature, so we can identify it and acquire the global 
> lock.
> For /xyz/bar, we can identify it through Reference flag, since the iNode bar 
> is a DstReference Node.
>  
> So we can use DirectorySnapshottableFeature and Reference to determine if we 
> need to acquire the Global lock when resolving the IIPs for input path.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17476:
--
Attachment: image-2024-04-18-10-57-10-481.png

> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-17 Thread Jian Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang updated HDFS-17476:
--
Description: 
In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
small negative number, clientStateId-serverStateId may be greater than 

(ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
                  * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
                  * ESTIMATED_SERVER_TIME_MULTIPLIER),

resulting in false positives that Observer Node is too far behind.

!image-2024-04-18-10-57-10-481.png|width=742,height=110!

  was:
In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
small negative number, clientStateId-serverStateId may be greater than 

(ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
                  * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
                  * ESTIMATED_SERVER_TIME_MULTIPLIER),

resulting in false positives that Observer Node is too far behind.

 


> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
> !image-2024-04-18-10-57-10-481.png|width=742,height=110!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17479) [FGL] Snapshot related operations still use global lock

2024-04-17 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17479:

Attachment: image-2024-04-18-11-00-02-011.png

> [FGL] Snapshot related operations still use global lock
> ---
>
> Key: HDFS-17479
> URL: https://issues.apache.org/jira/browse/HDFS-17479
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
> Attachments: image-2024-04-18-11-00-02-011.png, 
> image-2024-04-18-11-00-12-451.png
>
>
> Snapshot feature is a very useful feature in certain scenarios. As far as I 
> know, very few companies use this feature on the prod environment. The 
> implementation is complex and it is difficult to support FGL with only a 
> minor modifications.
> So we can still use the Global lock to make snapshot-related operations 
> thread-safe.
>  
> Snapshot has some access modules, let's analyze them and find a way to still 
> use GlobalLock.
>  
> !image-2024-04-18-10-31-34-458.png|width=288,height=219!
> The above picture shows a simple case, we can access the iNode foo through 
> the following paths:
>  # /abc/foo
>  # /abc/.snapshot/s1/foo
> If we want to delete the iNode foo, we need to lock /abc and 
> /abc/.snapshot/s1 (DirectoryWithSnapshotFeature on iNode abc).
> If we want to change permission of the iNode foo, we need to lock /abc/foo 
> and /abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)
>  
> For this case, we can directly acquire the global lock when resolving the 
> IIPs for the input path if there is an iNode that has 
> DirectorySnapshottableFeature.
> !image-2024-04-18-10-48-08-773.png|width=368,height=383!
> After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
> the above picture shows. We can access this bar through the following path:
>  # /abc/.snapshot/s1/bar
>  # /xyz/bar
> For /abc/.snapshot/s1/bar, since the iNode abc has 
> DirectorySnapshottableFeature, so we can identify it and acquire the global 
> lock.
> For /xyz/bar, we can identify it through Reference flag, since the iNode bar 
> is a DstReference Node.
>  
> So we can use DirectorySnapshottableFeature and Reference to determine if we 
> need to acquire the Global lock when resolving the IIPs for input path.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838446#comment-17838446
 ] 

ASF GitHub Bot commented on HDFS-17478:
---

hadoop-yetus commented on PR #6749:
URL: https://github.com/apache/hadoop/pull/6749#issuecomment-2062949405

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 21s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   0m 22s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  compile  |   0m  8s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -0 :warning: |  checkstyle  |   0m 19s | 
[/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  The patch fails to run checkstyle in hadoop-hdfs  |
   | -1 :x: |  mvnsite  |   0m 23s | 
[/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in trunk failed.  |
   | -1 :x: |  javadoc  |   0m 22s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javadoc  |   0m 23s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06.  |
   | -1 :x: |  spotbugs  |   0m 22s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in trunk failed.  |
   | +1 :green_heart: |  shadedclient  |   2m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 22s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  compile  |   0m 22s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6749/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.  |
   | -1 :x: |  javac  |   0m 22s | 

[jira] [Resolved] (HDFS-17472) [FGL] gcDeletedSnapshot and getDelegationToken support FGL

2024-04-17 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-17472.

Resolution: Fixed

> [FGL] gcDeletedSnapshot and getDelegationToken support FGL
> --
>
> Key: HDFS-17472
> URL: https://issues.apache.org/jira/browse/HDFS-17472
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> [FGL] gcDeletedSnapshot and getDelegationToken support FGL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17472) [FGL] gcDeletedSnapshot and getDelegationToken support FGL

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838415#comment-17838415
 ] 

ASF GitHub Bot commented on HDFS-17472:
---

ferhui commented on PR #6743:
URL: https://github.com/apache/hadoop/pull/6743#issuecomment-2062834549

   Thanks for contribution. Merged.




> [FGL] gcDeletedSnapshot and getDelegationToken support FGL
> --
>
> Key: HDFS-17472
> URL: https://issues.apache.org/jira/browse/HDFS-17472
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> [FGL] gcDeletedSnapshot and getDelegationToken support FGL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17478) FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance before each authz call

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17478:
--
Labels: pull-request-available  (was: )

> FSPermissionChecker to avoid obtaining a new AccessControlEnforcer instance 
> before each authz call
> --
>
> Key: HDFS-17478
> URL: https://issues.apache.org/jira/browse/HDFS-17478
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Madhan Neethiraj
>Assignee: Madhan Neethiraj
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17478.patch
>
>
> An instance of AccessControlEnforcer is obtained from the registered 
> INodeAttributeProvider before every call made to authorizer. This can be 
> avoided by initializing the AccessControlEnforcer instance during 
> construction of FsPermissionChecker and using it in every subsequent call to 
> the authorizer. This will eliminate the unnecessary overhead in highly 
> performance sensitive authz code path.
>  
> CC: [~abhay], [~arp], [~swagle]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17479) [FGL] Snapshot related operations still use global lock

2024-04-17 Thread ZanderXu (Jira)
ZanderXu created HDFS-17479:
---

 Summary: [FGL] Snapshot related operations still use global lock
 Key: HDFS-17479
 URL: https://issues.apache.org/jira/browse/HDFS-17479
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu


Snapshot feature is a very useful feature in certain scenarios. As far as I 
know, very few companies use this feature on the prod environment. The 
implementation is complex and it is difficult to support FGL with only a minor 
modifications.

So we can still use the Global lock to make snapshot-related operations 
thread-safe.

 

Snapshot has some access modules, let's analyze them and find a way to still 
use GlobalLock.

 

!image-2024-04-18-10-31-34-458.png|width=288,height=219!

The above picture shows a simple case, we can access the iNode foo through the 
following paths:
 # /abc/foo
 # /abc/.snapshot/s1/foo

If we want to delete the iNode foo, we need to lock /abc and /abc/.snapshot/s1 
(DirectoryWithSnapshotFeature on iNode abc).

If we want to change permission of the iNode foo, we need to lock /abc/foo and 
/abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)

 

For this case, we can directly acquire the global lock when resolving the IIPs 
for the input path if there is an iNode that has DirectorySnapshottableFeature.

!image-2024-04-18-10-48-08-773.png|width=368,height=383!

After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
the above picture shows. We can access this bar through the following path:
 # /abc/.snapshot/s1/bar
 # /xyz/bar

For /abc/.snapshot/s1/bar, since the iNode abc has 
DirectorySnapshottableFeature, so we can identify it and acquire the global 
lock.

For /xyz/bar, we can identify it through Reference flag, since the iNode bar is 
a DstReference Node.

 

So we can use DirectorySnapshottableFeature and Reference to determine if we 
need to acquire the Global lock when resolving the IIPs for input path.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17480) [FGL] GetListing RPC supports fine-grained locking

2024-04-17 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17480:

Description: 
GetListing is a very common used RPC by end-users. But we should consider how 
does GetListing support FGL.  

For example, there is directory /a/b/c contains some children, such as d1, d2, 
d3, f1, f2, f3.

Normally, we should hold the write lock iNode c for listing /a/b/c to make sure 
that there is no other threads are updating children of iNode c. But if the 
listing path is /, the entire directory tree will be locked, which will have a 
great impact.

 

There are two solutions to fix this problem:

Solution 1:
 * Hold the read lock of iNode c
 * Loop through all children
 ** Hold the read lock of each child and return it's file status

The result may contains some stale file status, because the looped children may 
be updated by other thread before the result of getListing is returned to 
client.

 

Solution 2:
 * Hold the write lock of parent and current Node when updating the current node
 ** Holding the write lock of iNode c and d1 when updating d1
 * Hold the read lock of iNode c
 * Loop through all children

This solution will increases the scope of lock, since the parent's write lock 
is usually not required.

 

I prefer the first solution, since namenode always returns results in batches. 
Changes may have occurred between batch and batch.

By the way, GetContentSummary will use the solution one.

  was:
GetListing is a very common used RPC by end-users. But we should consider how 
does GetListing support FGL.  

For example, there is directory /a/b/c contains some children, such as d1, d2, 
d3, f1, f2, f3.

Normally, we should hold the write lock iNode c for listing /a/b/c to make sure 
that there is no other threads are updating children of iNode c. But if the 
listing path is /, the entire directory tree will be locked, which will have a 
great impact.

 

There are two solutions to fix this problem:

Solution 1:
 * Hold the read lock of iNode c
 * Loop through all children
 ** Hold the read lock of each child and return it's file status

The result may contains some stale file status, because the looped children may 
be updated by other thread before the result of getListing is returned to 
client.

 

Solution 2:
 * Hold the write lock of parent and current Node when updating the current node
 ** Holding the write lock of iNode c and d1 when updating d1
 * Hold the read lock of iNode c
 * Loop through all children

This solution will increases the scope of lock, since the parent's write lock 
is usually not required.

 

I prefer the first solution, since namenode always returns results in batches. 
Changes may have occurred between batch and batch.


> [FGL] GetListing RPC supports fine-grained locking
> --
>
> Key: HDFS-17480
> URL: https://issues.apache.org/jira/browse/HDFS-17480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> GetListing is a very common used RPC by end-users. But we should consider how 
> does GetListing support FGL.  
> For example, there is directory /a/b/c contains some children, such as d1, 
> d2, d3, f1, f2, f3.
> Normally, we should hold the write lock iNode c for listing /a/b/c to make 
> sure that there is no other threads are updating children of iNode c. But if 
> the listing path is /, the entire directory tree will be locked, which will 
> have a great impact.
>  
> There are two solutions to fix this problem:
> Solution 1:
>  * Hold the read lock of iNode c
>  * Loop through all children
>  ** Hold the read lock of each child and return it's file status
> The result may contains some stale file status, because the looped children 
> may be updated by other thread before the result of getListing is returned to 
> client.
>  
> Solution 2:
>  * Hold the write lock of parent and current Node when updating the current 
> node
>  ** Holding the write lock of iNode c and d1 when updating d1
>  * Hold the read lock of iNode c
>  * Loop through all children
> This solution will increases the scope of lock, since the parent's write lock 
> is usually not required.
>  
> I prefer the first solution, since namenode always returns results in 
> batches. Changes may have occurred between batch and batch.
> By the way, GetContentSummary will use the solution one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17480) [FGL] GetListing RPC supports fine-grained locking

2024-04-17 Thread ZanderXu (Jira)
ZanderXu created HDFS-17480:
---

 Summary: [FGL] GetListing RPC supports fine-grained locking
 Key: HDFS-17480
 URL: https://issues.apache.org/jira/browse/HDFS-17480
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


GetListing is a very common used RPC by end-users. But we should consider how 
does GetListing support FGL.  

For example, there is directory /a/b/c contains some children, such as d1, d2, 
d3, f1, f2, f3.

Normally, we should hold the write lock iNode c for listing /a/b/c to make sure 
that there is no other threads are updating children of iNode c. But if the 
listing path is /, the entire directory tree will be locked, which will have a 
great impact.

 

There are two solutions to fix this problem:

Solution 1:
 * Hold the read lock of iNode c
 * Loop through all children
 ** Hold the read lock of each child and return it's file status

The result may contains some stale file status, because the looped children may 
be updated by other thread before the result of getListing is returned to 
client.

 

Solution 2:
 * Hold the write lock of parent and current Node when updating the current node
 ** Holding the write lock of iNode c and d1 when updating d1
 * Hold the read lock of iNode c
 * Loop through all children

This solution will increases the scope of lock, since the parent's write lock 
is usually not required.

 

I prefer the first solution, since namenode always returns results in batches. 
Changes may have occurred between batch and batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17472) [FGL] gcDeletedSnapshot and getDelegationToken support FGL

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838414#comment-17838414
 ] 

ASF GitHub Bot commented on HDFS-17472:
---

ferhui merged PR #6743:
URL: https://github.com/apache/hadoop/pull/6743




> [FGL] gcDeletedSnapshot and getDelegationToken support FGL
> --
>
> Key: HDFS-17472
> URL: https://issues.apache.org/jira/browse/HDFS-17472
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> [FGL] gcDeletedSnapshot and getDelegationToken support FGL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17479) [FGL] Snapshot related operations still use global lock

2024-04-17 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17479:

Attachment: image-2024-04-18-11-00-12-451.png

> [FGL] Snapshot related operations still use global lock
> ---
>
> Key: HDFS-17479
> URL: https://issues.apache.org/jira/browse/HDFS-17479
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
> Attachments: image-2024-04-18-11-00-02-011.png, 
> image-2024-04-18-11-00-12-451.png
>
>
> Snapshot feature is a very useful feature in certain scenarios. As far as I 
> know, very few companies use this feature on the prod environment. The 
> implementation is complex and it is difficult to support FGL with only a 
> minor modifications.
> So we can still use the Global lock to make snapshot-related operations 
> thread-safe.
>  
> Snapshot has some access modules, let's analyze them and find a way to still 
> use GlobalLock.
>  
> !image-2024-04-18-10-31-34-458.png|width=288,height=219!
> The above picture shows a simple case, we can access the iNode foo through 
> the following paths:
>  # /abc/foo
>  # /abc/.snapshot/s1/foo
> If we want to delete the iNode foo, we need to lock /abc and 
> /abc/.snapshot/s1 (DirectoryWithSnapshotFeature on iNode abc).
> If we want to change permission of the iNode foo, we need to lock /abc/foo 
> and /abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)
>  
> For this case, we can directly acquire the global lock when resolving the 
> IIPs for the input path if there is an iNode that has 
> DirectorySnapshottableFeature.
> !image-2024-04-18-10-48-08-773.png|width=368,height=383!
> After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
> the above picture shows. We can access this bar through the following path:
>  # /abc/.snapshot/s1/bar
>  # /xyz/bar
> For /abc/.snapshot/s1/bar, since the iNode abc has 
> DirectorySnapshottableFeature, so we can identify it and acquire the global 
> lock.
> For /xyz/bar, we can identify it through Reference flag, since the iNode bar 
> is a DstReference Node.
>  
> So we can use DirectorySnapshottableFeature and Reference to determine if we 
> need to acquire the Global lock when resolving the IIPs for input path.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17479) [FGL] Snapshot related operations still use global lock

2024-04-17 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17479:

Description: 
Snapshot feature is a very useful feature in certain scenarios. As far as I 
know, very few companies use this feature on the prod environment. The 
implementation is complex and it is difficult to support FGL with only a minor 
modifications.

So we can still use the Global lock to make snapshot-related operations 
thread-safe.

 

Snapshot has some access modules, let's analyze them and find a way to still 
use GlobalLock.

!image-2024-04-18-11-00-12-451.png|width=288,height=219!

The above picture shows a simple case, we can access the iNode foo through the 
following paths:
 # /abc/foo
 # /abc/.snapshot/s1/foo

If we want to delete the iNode foo, we need to lock /abc and /abc/.snapshot/s1 
(DirectoryWithSnapshotFeature on iNode abc).

If we want to change permission of the iNode foo, we need to lock /abc/foo and 
/abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)

 

For this case, we can directly acquire the global lock when resolving the IIPs 
for the input path if there is an iNode that has DirectorySnapshottableFeature.

!image-2024-04-18-11-00-02-011.png|width=368,height=383!

After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
the above picture shows. We can access this bar through the following path:
 # /abc/.snapshot/s1/bar
 # /xyz/bar

For /abc/.snapshot/s1/bar, since the iNode abc has 
DirectorySnapshottableFeature, so we can identify it and acquire the global 
lock.

For /xyz/bar, we can identify it through Reference flag, since the iNode bar is 
a DstReference Node.

 

So we can use DirectorySnapshottableFeature and Reference to determine if we 
need to acquire the Global lock when resolving the IIPs for input path.

 

  was:
Snapshot feature is a very useful feature in certain scenarios. As far as I 
know, very few companies use this feature on the prod environment. The 
implementation is complex and it is difficult to support FGL with only a minor 
modifications.

So we can still use the Global lock to make snapshot-related operations 
thread-safe.

 

Snapshot has some access modules, let's analyze them and find a way to still 
use GlobalLock.

 

!image-2024-04-18-10-31-34-458.png|width=288,height=219!

The above picture shows a simple case, we can access the iNode foo through the 
following paths:
 # /abc/foo
 # /abc/.snapshot/s1/foo

If we want to delete the iNode foo, we need to lock /abc and /abc/.snapshot/s1 
(DirectoryWithSnapshotFeature on iNode abc).

If we want to change permission of the iNode foo, we need to lock /abc/foo and 
/abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature on the iNode foo)

 

For this case, we can directly acquire the global lock when resolving the IIPs 
for the input path if there is an iNode that has DirectorySnapshottableFeature.

!image-2024-04-18-10-48-08-773.png|width=368,height=383!

After /abc/foo is renamed to /xyz/bar, the access modules will be changed, as 
the above picture shows. We can access this bar through the following path:
 # /abc/.snapshot/s1/bar
 # /xyz/bar

For /abc/.snapshot/s1/bar, since the iNode abc has 
DirectorySnapshottableFeature, so we can identify it and acquire the global 
lock.

For /xyz/bar, we can identify it through Reference flag, since the iNode bar is 
a DstReference Node.

 

So we can use DirectorySnapshottableFeature and Reference to determine if we 
need to acquire the Global lock when resolving the IIPs for input path.

 


> [FGL] Snapshot related operations still use global lock
> ---
>
> Key: HDFS-17479
> URL: https://issues.apache.org/jira/browse/HDFS-17479
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
> Attachments: image-2024-04-18-11-00-02-011.png, 
> image-2024-04-18-11-00-12-451.png
>
>
> Snapshot feature is a very useful feature in certain scenarios. As far as I 
> know, very few companies use this feature on the prod environment. The 
> implementation is complex and it is difficult to support FGL with only a 
> minor modifications.
> So we can still use the Global lock to make snapshot-related operations 
> thread-safe.
>  
> Snapshot has some access modules, let's analyze them and find a way to still 
> use GlobalLock.
> !image-2024-04-18-11-00-12-451.png|width=288,height=219!
> The above picture shows a simple case, we can access the iNode foo through 
> the following paths:
>  # /abc/foo
>  # /abc/.snapshot/s1/foo
> If we want to delete the iNode foo, we need to lock /abc and 
> /abc/.snapshot/s1 (DirectoryWithSnapshotFeature on iNode abc).
> If we want to change permission of the iNode foo, we need to lock /abc/foo 
> and /abc/.snapshot/s1/foo (DirectoryWithSnapshotFeature 

[jira] [Commented] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838469#comment-17838469
 ] 

ASF GitHub Bot commented on HDFS-17477:
---

hadoop-yetus commented on PR #6748:
URL: https://github.com/apache/hadoop/pull/6748#issuecomment-2063042088

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6748/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 86 unchanged - 
0 fixed = 88 total (was 86)  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 279m 38s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6748/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 421m 54s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
   |   | hadoop.hdfs.server.blockmanagement.TestPendingDataNodeMessages |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6748/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6748 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e26fa349236a 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f374cae62cb6e250ba650be7f731d21ed888a6aa |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions |