[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425238#comment-16425238 ] maobaolong commented on HDFS-8966: -- [~hexiaoqiao] [~wheat9] [~jingzhao] Hi~, maybe we can mark some class with threadSafe annotation and guardedby annotation to mark the class, variable and method > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Major > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373861#comment-16373861 ] He Xiaoqiao commented on HDFS-8966: --- [~wheat9],[~jingzhao] is this issue still ongoing? > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Major > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074718#comment-15074718 ] Walter Su commented on HDFS-8966: - I'm worried about deadlock. Shall we pre-determine the lock ordering? Mostly about {{fsnLock}}, {{fsdLock}}, and {{BlockManagerLock}}. 1. BlockManager -> fsn.getBlockCollection(id) -> fsd.getInode(id) will acquire {{fsdLock}} 2. A lot of fsdir ops will call bm's method (with fsdLock locked). Too many locks causes confusion. 3. {code} // FSDirectory.java // lock to protect the directory and BlockMap private final ReentrantReadWriteLock dirLock; {code} That's not true. I saw in many places, fsn calls bm's method without fsdLock locked. Actually it is fsnLock who protect both directory and BlockMap. Can we retire fsnLock, and use fsdLock to protect namespace, and BlockManagerLock to protect BlockMap? > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032831#comment-15032831 ] Ming Ma commented on HDFS-8966: --- Great work! Besides ask for design doc, I am interested in knowing the test plan and specifically how to reason about its correctness. > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944224#comment-14944224 ] Jing Zhao commented on HDFS-8966: - Based on the discussion in HDFS-8967, it makes more sense to do this work in a feature branch. I've create branch HDFS-8966 and will update the target version of the jira to it. > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729826#comment-14729826 ] Jing Zhao commented on HDFS-8966: - Yeah, I agree more design details are necessary. In the meanwhile, I think to have a meeting discussing the issue and possible solutions will be helpful. Maybe we can organize a meeting sometime next week? > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729637#comment-14729637 ] Daryn Sharp commented on HDFS-8966: --- I understand the goal, and fully support it, but it's not a trivial problem to solve. Would you please share specific design details on how you plan to safely decouple them? > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729491#comment-14729491 ] Haohui Mai commented on HDFS-8966: -- The final goal is to allow the namespace and the blockmanager run independently. They reside in the same process but they should share no locks and their communications should be explicit. The intermediate goal is to allow the NN process block report without holding the write lock of the FSNamesystem lock. > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer
[ https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729433#comment-14729433 ] Daryn Sharp commented on HDFS-8966: --- Would you please elaborate on the design vision for doing the final separation? In my rare spare cycles I've been trying to do the same thing. > Separate the lock used in namespace and block management layer > -- > > Key: HDFS-8966 > URL: https://issues.apache.org/jira/browse/HDFS-8966 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > > Currently the namespace and the block management layer share one giant lock. > One consequence that we have seen more and more often is that the namespace > hangs due to excessive activities from the block management layer. For > example, the NN might take a couple hundred milliseconds to handle a large > block report. Because the NN holds the write lock during processing the block > report, all namespace requests are paused. In production we have seen these > lock contentions cause long latencies and instabilities in the cluster. > This umbrella jira proposes to separate the lock used by namespace and the > block management layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)