[jira] [Created] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Erik Krogen (JIRA) Fri, 07 Sep 2018 11:12:10 -0700

Erik Krogen created HDFS-13904:
----------------------------------

             Summary: ContentSummary does not always respect processing limit, 
resulting in long lock acquisitions
                 Key: HDFS-13904
                 URL: https://issues.apache.org/jira/browse/HDFS-13904
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs, namenode
            Reporter: Erik Krogen
            Assignee: Erik Krogen



HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
administrator to set a limit on the number of entries processed during a single 
acquisition of the {{FSNamesystemLock}} during the creation of a content 
summary. This is useful to prevent very long (multiple seconds) pauses on the 
NameNode when {{getContentSummary}} is called on large directories.

However, even on versions with HDFS-4995, we have seen warnings like:
{code}
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
lock held for 9398 ms via
java.lang.Thread.getStackTrace(Thread.java:1552)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
{code}
happen quite consistently when {{getContentSummary}} was called on a large 
directory on a heavily-loaded NameNode. Such long pauses completely destroy the 
performance of the NameNode. We have the limit set to its default of 5000; if 
it was respected, clearly there would not be a 10-second pause.

The current {{yield()}} code within {{ContentSummaryComputationContext}} looks 
like:
{code}
  public boolean yield() {
    // Are we set up to do this?
    if (limitPerRun <= 0 || dir == null || fsn == null) {
      return false;
    }

    // Have we reached the limit?
    long currentCount = counts.getFileCount() +
        counts.getSymlinkCount() +
        counts.getDirectoryCount() +
        counts.getSnapshotableDirectoryCount();
    if (currentCount <= nextCountLimit) {
      return false;
    }

    // Update the next limit
    nextCountLimit = currentCount + limitPerRun;

    boolean hadDirReadLock = dir.hasReadLock();
    boolean hadDirWriteLock = dir.hasWriteLock();
    boolean hadFsnReadLock = fsn.hasReadLock();
    boolean hadFsnWriteLock = fsn.hasWriteLock();

    // sanity check.
    if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
        hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
        fsn.getReadHoldCount() != 1) {
      // cannot relinquish
      return false;
    }

    // unlock
    dir.readUnlock();
    fsn.readUnlock("contentSummary");

    try {
      Thread.sleep(sleepMilliSec, sleepNanoSec);
    } catch (InterruptedException ie) {
    } finally {
      // reacquire
      fsn.readLock();
      dir.readLock();
    }
    yieldCount++;
    return true;
  }
{code}
We believe that this check in particular is the culprit:
{code}
    if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
        hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
        fsn.getReadHoldCount() != 1) {
      // cannot relinquish
      return false;
    }
{code}
The content summary computation will only relinquish the lock if it is 
currently the _only_ holder of the lock. Given the high volume of read requests 
on a heavily loaded NameNode, especially when unfair locking is enabled, it is 
likely there may be another holder of the read lock performing some short-lived 
operation. By refusing to give up the lock in this case, the content summary 
computation ends up never relinquishing the lock.

We propose to simply remove the readHoldCount checks from this {{yield()}}. 
This should alleviate the case described above by giving up the read lock and 
allowing other short-lived operations to complete (while the content summary 
thread sleeps) so that the lock can finally be given up completely. This has 
the drawback that sometimes, the content summary may give up the lock 
unnecessarily, if the read lock is never actually released by the time the 
thread continues again. The only negative impact from this is to make some 
large content summary operations slightly slower, with the tradeoff of reducing 
NameNode-wide performance impact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Reply via email to