[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17822089#comment-17822089
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

tomscut commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1970911161

   @zhangshuyan0 Hi, I closed the issue HDFS-17358.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820935#comment-17820935
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

zhangshuyan0 commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1965670563

   Committed to trunk. Thanks for your contributions! @hfutatzhanghb 
@haiyang1987 @tomscut 




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820933#comment-17820933
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

zhangshuyan0 merged PR #6509:
URL: https://github.com/apache/hadoop/pull/6509




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820928#comment-17820928
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1965657242

   The failed UTs were all passed in my local.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820797#comment-17820797
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1964732195

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 15s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  40m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 253m 32s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/29/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 423m 12s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/29/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c38cac31b717 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b0b1ef7756ba5e37d23eddb91821a20bd1520dad |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/29/testReport/ |
   | Max. process+thread count | 3048 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820704#comment-17820704
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1964163458

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 197m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/28/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 299m  5s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/28/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 1895a134d100 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5c79b1385226049a124730171225d34960ac505b |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/28/testReport/ |
   | Max. process+thread count | 4003 (vs. ulimit of 5500) |
   | modules | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820611#comment-17820611
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1963495064

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 27s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/27/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 203m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/27/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 305m 10s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/27/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f527b079541a 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c7e39505156b4a7744c57737165c29be8ec1a0e1 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820533#comment-17820533
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

tomscut commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1963271347

   Good catch! The changes look good to me. Wait for the Jenkins.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-25 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820528#comment-17820528
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1963222489

   @zhangshuyan0 @haiyang1987 @tasanuma @tomscut Sir,  have updated unit test, 
please check it again~




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820437#comment-17820437
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

haiyang1987 commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1962817997

The result of the CI is here. 
https://ci-hadoop.apache.org/blue/organizations/jenkins/hadoop-multibranch/detail/PR-6509/26/tests
 
   The failed tests seem not to be related.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820435#comment-17820435
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

haiyang1987 commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1962817322

   LGTM +1.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819465#comment-17819465
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on code in PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#discussion_r1498581949


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -436,9 +442,17 @@ protected void recover() throws IOException {
   "datanode={})", block, internalBlk, id, e);
 }
   }
-  checkLocations(syncBlocks.size());
 
-  final long safeLength = getSafeLength(syncBlocks);
+  final long safeLength;
+  if (dnNotHaveReplicaCnt + zeroLenReplicaCnt <= locs.length - 
ecPolicy.getNumDataUnits()) {
+checkLocations(syncBlocks.size());
+safeLength = getSafeLength(syncBlocks);
+  } else {
+safeLength = 0;
+LOG.warn("Block recovery: More than {} datanodes do not have the 
replica of block {}." +

Review Comment:
   It seems useless here, have removed it . Originally, it was more than (9 - 
6) datanodes ...





> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819464#comment-17819464
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

zhangshuyan0 commented on code in PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#discussion_r1498579433


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -436,9 +442,17 @@ protected void recover() throws IOException {
   "datanode={})", block, internalBlk, id, e);
 }
   }
-  checkLocations(syncBlocks.size());
 
-  final long safeLength = getSafeLength(syncBlocks);
+  final long safeLength;
+  if (dnNotHaveReplicaCnt + zeroLenReplicaCnt <= locs.length - 
ecPolicy.getNumDataUnits()) {
+checkLocations(syncBlocks.size());
+safeLength = getSafeLength(syncBlocks);
+  } else {
+safeLength = 0;
+LOG.warn("Block recovery: More than {} datanodes do not have the 
replica of block {}." +

Review Comment:
   What does this "More than" mean?





> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819462#comment-17819462
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on code in PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#discussion_r1498576377


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -436,9 +442,17 @@ protected void recover() throws IOException {
   "datanode={})", block, internalBlk, id, e);
 }
   }
-  checkLocations(syncBlocks.size());
 
-  final long safeLength = getSafeLength(syncBlocks);
+  final long safeLength;
+  if (dnNotHaveReplicaCnt + zeroLenReplicaCnt <= locs.length - 
ecPolicy.getNumDataUnits()) {
+checkLocations(syncBlocks.size());
+safeLength = getSafeLength(syncBlocks);
+  } else {
+safeLength = 0;
+LOG.warn("Block recovery: More than {} datanodes do not have the 
replica of block {}." +

Review Comment:
   Fixed it. Thanks sir.





> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819458#comment-17819458
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

zhangshuyan0 commented on code in PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#discussion_r1498565609


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -436,9 +442,17 @@ protected void recover() throws IOException {
   "datanode={})", block, internalBlk, id, e);
 }
   }
-  checkLocations(syncBlocks.size());
 
-  final long safeLength = getSafeLength(syncBlocks);
+  final long safeLength;
+  if (dnNotHaveReplicaCnt + zeroLenReplicaCnt <= locs.length - 
ecPolicy.getNumDataUnits()) {
+checkLocations(syncBlocks.size());
+safeLength = getSafeLength(syncBlocks);
+  } else {
+safeLength = 0;
+LOG.warn("Block recovery: More than {} datanodes do not have the 
replica of block {}." +

Review Comment:
   Suggest printing out the value of `zeroLenReplicaCnt` as well.





> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819110#comment-17819110
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb opened a new pull request, #6509:
URL: https://github.com/apache/hadoop/pull/6509

   ### Description of PR
   Refer to HDFS-17358.
   
   Recently, there is a strange case happened on our ec production cluster.
   
   The phenomenon is as below described: NameNode does infinite recovery lease 
of some ec files(~80K+) and those files could never be closed.
   
   After digging into logs and releated code, we found the root cause is below 
codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
   
   ```java
 // we met info.getNumBytes==0 here! 
 if (info != null &&
 info.getGenerationStamp() >= block.getGenerationStamp() &&
 info.getNumBytes() > 0) {
   final BlockRecord existing = syncBlocks.get(blockId);
   if (existing == null ||
   info.getNumBytes() > existing.rInfo.getNumBytes()) {
 // if we have >1 replicas for the same internal block, we
 // simply choose the one with larger length.
 // TODO: better usage of redundant replicas
 syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
   }
 }
   
 // throw exception here!
 checkLocations(syncBlocks.size());
   
   ```
   
   The related logs are as below:
   
   >java.io.IOException: 
BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 has 
no enough internal blocks, unable to start recovery. Locations=[...] 
   
   >2024-01-23 12:48:16,171 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
/data25/hadoop/hdfs/datanode getBlockURI() = 
file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
 recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode getBlockURI() 
= 
file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
   
   because the length of RWR is zero,  the length of the returned object in 
below codes is zero. We can't put it into syncBlocks.
   So throw exception in checkLocations method.
   
   >ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,new 
RecoveringBlock(internalBlk, null, recoveryId)); 




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819109#comment-17819109
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb closed pull request #6509: HDFS-17358. EC: infinite lease 
recovery caused by the length of RWR equals to zero.
URL: https://github.com/apache/hadoop/pull/6509




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818704#comment-17818704
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1953735541

   @zhangshuyan0 @tasanuma @Hexiaoqiao Sir, code is ready for reviewing. Could 
you please help me review this PR when you have free time? Thanks a lot.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813546#comment-17813546
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1923279796

   > I haven't looked at the code in detail, but the unit test seems to be 
failing.
   
   Hi~ Sir. I have't finished UT, convert this PR to draft. Will update new UT 
soon.
   




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-02-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17813515#comment-17813515
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

tasanuma commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1923134680

   I haven't looked at the code in detail, but the unit test seems to be 
failing.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812813#comment-17812813
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1919549958

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 30s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 295m 58s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/16/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 434m 23s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/16/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 60562d747c28 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d81842d10403df7aed7716d23b9c6483cd644c42 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/16/testReport/ |
   | Max. process+thread count | 2905 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812812#comment-17812812
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1919545451

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 59s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 32s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m  8s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 31s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m  9s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 294m 12s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/17/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 430m 53s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/17/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 40fde980cf1b 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d81842d10403df7aed7716d23b9c6483cd644c42 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/17/testReport/ |
   | Max. process+thread count | 2945 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812808#comment-17812808
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1919530517

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 261m 49s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/18/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 417m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/18/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 3edaa27eaa6c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d81842d10403df7aed7716d23b9c6483cd644c42 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/18/testReport/ |
   | Max. process+thread count | 2720 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812777#comment-17812777
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1919434400

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  24m 19s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/15/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/15/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 250m 47s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/15/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 380m 27s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 3a595a9d821c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c2ffdba975f184542e2c2e56beadb9030e14635c |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812721#comment-17812721
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1919217088

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 36s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 31s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/14/artifact/out/blanks-eol.txt)
 |  The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 193m 57s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 279m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/14/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c9ab5eb11ff5 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5abded25439d3883bfef4e4596cd7e976541e6ae |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/14/testReport/ |
   | Max. process+thread count | 4804 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/14/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812629#comment-17812629
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1918776304

   @Hexiaoqiao @zhangshuyan0 @haiyang1987 @tasanuma Sir, have uploaded an 
simple unit test. Please help me review this code when you are free. Thanks!




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812426#comment-17812426
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1917563938

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 52s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/12/artifact/out/blanks-eol.txt)
 |  The patch has 5 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 199m 34s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/12/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 291m 26s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux ab7ea8abd688 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4619de20d50f9ad1380d3ee407d5a615e94539b5 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/12/testReport/ |
   | Max. process+thread count | 4069 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812335#comment-17812335
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916966515

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 223m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 372m 36s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c0a7f19e0fa7 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6c1081a5fb506f9a96078569ee11a6a242bf7552 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/10/testReport/ |
   | Max. process+thread count | 3981 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812326#comment-17812326
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916911462

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/11/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 194m  6s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 278m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 1dac899ed063 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 64d91bcf6cefbda4b6f89517b06da48f2b3fefc5 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/11/testReport/ |
   | Max. process+thread count | 4207 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812264#comment-17812264
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916570884

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/9/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/9/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 253m 49s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 405m 54s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecoveryStriped |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 1f5d84414741 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96ecedc969d91205d9cd40f9045a1b8d3538926b |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812259#comment-17812259
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916558695

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/7/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/7/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 249m 58s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 400m 47s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecoveryStriped |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 26ad0358d483 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96ecedc969d91205d9cd40f9045a1b8d3538926b |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812248#comment-17812248
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916474196

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 39s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  4s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/6/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 56s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/6/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 225m  0s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 359m 49s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f19d056e8500 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96ecedc969d91205d9cd40f9045a1b8d3538926b |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812209#comment-17812209
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916349542

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/8/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 28s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/8/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 205m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 291m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux af6ba03f974e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96ecedc969d91205d9cd40f9045a1b8d3538926b |
   | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812203#comment-17812203
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1916338179

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/5/artifact/out/blanks-eol.txt)
 |  The patch has 4 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 28s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/5/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5)  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 17s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 200m 33s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 287m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux fac5502bec56 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96ecedc969d91205d9cd40f9045a1b8d3538926b |
   | 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811978#comment-17811978
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1915073995

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 194m 45s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 279m 24s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 290d3c2b0daf 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5f436eabe8d43636da54f4768957f6b18879480c |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/4/testReport/ |
   | Max. process+thread count | 4196 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811969#comment-17811969
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1915049414

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 247m 48s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 415m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0574605d9246 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8ef1574cef17dd6446e232e4f863c2c93498f39f |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/3/testReport/ |
   | Max. process+thread count | 2846 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811894#comment-17811894
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914767780

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 23s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 199m 56s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 287m 32s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 33a97bbb8d75 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8ef1574cef17dd6446e232e4f863c2c93498f39f |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/2/testReport/ |
   | Max. process+thread count | 3981 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811839#comment-17811839
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

haiyang1987 commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914526790

   > > Contributor
   > 
   > Hi, sir. IIUC, it is safe. The reason is as below, original code is :
   > 
   > ```java
   >   ExtendedBlock newBlock = new ExtendedBlock(bpid, block.getBlockId(),
   >   safeLength, recoveryId);
   >   DatanodeProtocolClientSideTranslatorPB nn = 
getActiveNamenodeForBP(bpid);
   > 
   >   nn.commitBlockSynchronization(block, newBlock.getGenerationStamp(),
   >   newBlock.getNumBytes(), true, false, newLocs, newStorages);
   > ```
   > 
   > We initialize an ExtendedBlock object named newBlock with safeLength 
equals to 0, then passing `newBlock.getNumBytes()` to method 
commitBlockSynchronization which means newlength.
   
   yeah,I mean  is it possible to determine if safeLength is 0, then directly 
call the logic of deleting this block?
   ```
   if (safeLength == 0) {
   nn.commitBlockSynchronization(block, newBlock.getGenerationStamp(),
   newBlock.getNumBytes(), true, true, newLocs, newStorages);
   LOG.info("After block recovery, the length of new block is 0. " +
   "Will remove this block: {} from file.", newBlock);
   return;
}
   ```




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811838#comment-17811838
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914526737

   > 
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java#L441
   > 
   > ```
   > final long safeLength = getSafeLength(syncBlocks);
   > LOG.debug("Recovering block {}, length={}, safeLength={}, syncList={}", 
block,block.getNumBytes(), safeLength, syncBlocks);
   > ```
   > 
   > if info.getNumBytes==0 ,then safeLength maybe as 0? we can determine that 
if safeLength is 0 can remove this block?
   
   Hi, sir. Thanks for your reviewing. Yes, if info.getNumBytes==0 ,then 
safeLength is 0. I agree with you for using safeLength as the condition because 
it is more readable.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811836#comment-17811836
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914520297

   > Contributor
   
   Hi, sir. IIUC, it is safe.
   The reason is as below, original code is :
   ```java
 ExtendedBlock newBlock = new ExtendedBlock(bpid, block.getBlockId(),
 safeLength, recoveryId);
 DatanodeProtocolClientSideTranslatorPB nn = 
getActiveNamenodeForBP(bpid);
   
 nn.commitBlockSynchronization(block, newBlock.getGenerationStamp(),
 newBlock.getNumBytes(), true, false, newLocs, newStorages);
   ```
   We initialize an ExtendedBlock object named newBlock with safeLength equals 
to 0, then passing `newBlock.getNumBytes()` to method 
commitBlockSynchronization which means newlength.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811832#comment-17811832
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

haiyang1987 commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914507289

   I have a little question.
   
   
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java#L441
  
   ```
   final long safeLength = getSafeLength(syncBlocks);
   LOG.debug("Recovering block {}, length={}, safeLength={}, syncList={}", 
block,block.getNumBytes(), safeLength, syncBlocks);
   ```
   if info.getNumBytes==0 ,then  safeLength maybe as 0? 
   we can determine that if safeLength is 0 can remove this block?
   
   




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811823#comment-17811823
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914451743

   Thanks a lot for responsing, Sir. Will add an UT soonly.
   
   
   
   --

> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811798#comment-17811798
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1914341592

   @Hexiaoqiao @zhangshuyan0 @tomscut  Sir, could you please take a look at 
this PR when you have free time? Thanks a lot.




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
> getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
> /data25/hadoop/hdfs/datanode getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
>  recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
> blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
> getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode 
> getBlockURI() = 
> file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
> {code}
> because the length of RWR is zero,  the length of the returned object in 
> below codes is zero. We can't put it into syncBlocks.
> So throw exception in checkLocations method.
> {code:java}
>           ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,
>               new RecoveringBlock(internalBlk, null, recoveryId)); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811683#comment-17811683
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hadoop-yetus commented on PR #6509:
URL: https://github.com/apache/hadoop/pull/6509#issuecomment-1913701342

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 195m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 293m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecovery |
   |   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
   |   | hadoop.hdfs.TestLeaseRecoveryStriped |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6509 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d1e5494161f3 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 30e3bdaa8262a0ee2ed2c00773d3420a0b096ac6 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6509/1/testReport/ |
   | Max. 

[jira] [Commented] (HDFS-17358) EC: infinite lease recovery caused by the length of RWR equals to zero.

2024-01-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811670#comment-17811670
 ] 

ASF GitHub Bot commented on HDFS-17358:
---

hfutatzhanghb opened a new pull request, #6509:
URL: https://github.com/apache/hadoop/pull/6509

   ### Description of PR
   Refer to HDFS-17358.
   
   Recently, there is a strange case happened on our ec production cluster.
   
   The phenomenon is as below described: NameNode does infinite recovery lease 
of some ec files(~80K+) and those files could never be closed.
   
   After digging into logs and releated code, we found the root cause is below 
codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
   
   ```java
 // we met info.getNumBytes==0 here! 
 if (info != null &&
 info.getGenerationStamp() >= block.getGenerationStamp() &&
 info.getNumBytes() > 0) {
   final BlockRecord existing = syncBlocks.get(blockId);
   if (existing == null ||
   info.getNumBytes() > existing.rInfo.getNumBytes()) {
 // if we have >1 replicas for the same internal block, we
 // simply choose the one with larger length.
 // TODO: better usage of redundant replicas
 syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
   }
 }
   
 // throw exception here!
 checkLocations(syncBlocks.size());
   
   ```
   
   The related logs are as below:
   
   >java.io.IOException: 
BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 has 
no enough internal blocks, unable to start recovery. Locations=[...] 
   
   >2024-01-23 12:48:16,171 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
replica=ReplicaUnderRecovery, blk_-9223372036808032686_2938828, RUR 
getNumBytes() = 0 getBytesOnDisk() = 0 getVisibleLength()= -1 getVolume() = 
/data25/hadoop/hdfs/datanode getBlockURI() = 
file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-x.x.x.x-1702548776421/current/rbw/blk_-9223372036808032686
 recoveryId=27529675 original=ReplicaWaitingToBeRecovered, 
blk_-9223372036808032686_2938828, RWR getNumBytes() = 0 getBytesOnDisk() = 0 
getVisibleLength()= -1 getVolume() = /data25/hadoop/hdfs/datanode getBlockURI() 
= 
file:/data25/hadoop/hdfs/datanode/current/BP-1157541496-10.104.10.198-1702548776421/current/rbw/blk_-9223372036808032686
   
   because the length of RWR is zero,  the length of the returned object in 
below codes is zero. We can't put it into syncBlocks.
   So throw exception in checkLocations method.
   
   >ReplicaRecoveryInfo info = callInitReplicaRecovery(proxyDN,new 
RecoveringBlock(internalBlk, null, recoveryId)); 




> EC: infinite lease recovery caused by the length of RWR equals to zero.
> ---
>
> Key: HDFS-17358
> URL: https://issues.apache.org/jira/browse/HDFS-17358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> Recently, there is a strange case happened on our ec production cluster.
> The phenomenon is as below described: NameNode does infinite recovery lease 
> of some ec files(~80K+) and those files could never be closed.
>  
> After digging into logs and releated code, we found the root cause is below 
> codes in method `BlockRecoveryWorker$RecoveryTaskStriped#recover`:
> {code:java}
>           // we met info.getNumBytes==0 here! 
>   if (info != null &&
>               info.getGenerationStamp() >= block.getGenerationStamp() &&
>               info.getNumBytes() > 0) {
>             final BlockRecord existing = syncBlocks.get(blockId);
>             if (existing == null ||
>                 info.getNumBytes() > existing.rInfo.getNumBytes()) {
>               // if we have >1 replicas for the same internal block, we
>               // simply choose the one with larger length.
>               // TODO: better usage of redundant replicas
>               syncBlocks.put(blockId, new BlockRecord(id, proxyDN, info));
>             }
>           }
>   // throw exception here!
>           checkLocations(syncBlocks.size());
> {code}
> The related logs are as below:
> {code:java}
> java.io.IOException: 
> BP-1157541496-10.104.10.198-1702548776421:blk_-9223372036808032688_2938828 
> has no enough internal blocks, unable to start recovery. Locations=[...] 
> {code}
> {code:java}
> 2024-01-23 12:48:16,171 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> initReplicaRecovery: blk_-9223372036808032686_2938828, recoveryId=27615365, 
> replica=ReplicaUnderRecovery,