[ 
https://issues.apache.org/jira/browse/HDFS-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129676#comment-17129676
 ] 

Ayush Saxena commented on HDFS-15292:
-------------------------------------

It is already mention in the code that such a situation can lead to infinite 
loop in lease manager.

{code:java}
      // Cannot close file right now, since some blocks 
      // are not yet minimally replicated.
      // This may potentially cause infinite loop in lease recovery
      // if there are no valid replicas on data-nodes.
      String message = "DIR* NameSystem.internalReleaseLease: " +
          "Failed to release lease for file " + src +
          ". Committed blocks are waiting to be mi
{code}

If this is a frequent occurence, you shouldn't allow files to close with 
committed blocks itself. dfs.namenode.file.close.num-committed-allowed 
shouldn't be set

> Infinite loop in Lease Manager due to replica is missing in dn
> --------------------------------------------------------------
>
>                 Key: HDFS-15292
>                 URL: https://issues.apache.org/jira/browse/HDFS-15292
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.1.3
>            Reporter: Aaron Guo
>            Priority: Major
>
> In our production environment, we found that files of under construction keep 
> growing, and the lease manager is trying to release the lease in a Infinite 
> loop:
> {code:java}
> 2020-04-18 23:10:57,816 WARN  namenode.LeaseManager 
> (LeaseManager.java:checkLeases(589)) - Cannot release the path 
> /user/hadoop/myTestFile.txt in the lease [Lease.  Holder: 
> go-hdfs-7VVGF3sGvHZcsZZC, pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file 
> /user/hadoop/myTestFile.txt. Committed blocks are waiting to be minimally 
> replicated. Try again later.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3391)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:586)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:524)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
>  This is because the last block of this file can NOT meet the minimum 
> required replica of 1, a  AlreadyBeingCreatedException get thrown, and it 
> will keeps retry forever.
> This infinite loop also cause another issue since the lease manager always 
> trying to release the first lease then goto the next one, so no lease will be 
> released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to