[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2018-04-04 Thread maobaolong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425238#comment-16425238
 ] 

maobaolong commented on HDFS-8966:
--

[~hexiaoqiao] [~wheat9] [~jingzhao] 
Hi~, maybe we can mark some class with threadSafe annotation and guardedby 
annotation to mark the class, variable and method  

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Major
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2018-02-22 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373861#comment-16373861
 ] 

He Xiaoqiao commented on HDFS-8966:
---

[~wheat9],[~jingzhao] is this issue still ongoing?

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Major
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-12-29 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074718#comment-15074718
 ] 

Walter Su commented on HDFS-8966:
-

I'm worried about deadlock. Shall we pre-determine the lock ordering? Mostly 
about {{fsnLock}}, {{fsdLock}}, and {{BlockManagerLock}}.
1. BlockManager -> fsn.getBlockCollection(id) -> fsd.getInode(id) will acquire 
{{fsdLock}}
2. A lot of fsdir ops will call bm's method (with fsdLock locked).

Too many locks causes confusion.
3.
{code}
// FSDirectory.java
  // lock to protect the directory and BlockMap
  private final ReentrantReadWriteLock dirLock;
{code}
That's not true. I saw in many places, fsn calls bm's method without fsdLock 
locked. Actually it is fsnLock who protect both directory and BlockMap.

Can we retire fsnLock, and use fsdLock to protect namespace, and 
BlockManagerLock to protect BlockMap?

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-11-30 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032831#comment-15032831
 ] 

Ming Ma commented on HDFS-8966:
---

Great work!

Besides ask for design doc, I am interested in knowing the test plan and 
specifically how to reason about its correctness.

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-10-05 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944224#comment-14944224
 ] 

Jing Zhao commented on HDFS-8966:
-

Based on the discussion in HDFS-8967, it makes more sense to do this work in a 
feature branch. I've create branch HDFS-8966 and will update the target version 
of the jira to it.

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-09-03 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729826#comment-14729826
 ] 

Jing Zhao commented on HDFS-8966:
-

Yeah, I agree more design details are necessary. In the meanwhile, I think to 
have a meeting discussing the issue and possible solutions will be helpful. 
Maybe we can organize a meeting sometime next week?

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-09-03 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729637#comment-14729637
 ] 

Daryn Sharp commented on HDFS-8966:
---

I understand the goal, and fully support it, but it's not a trivial problem to 
solve.  Would you please share specific design details on how you plan to 
safely decouple them?

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-09-03 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729491#comment-14729491
 ] 

Haohui Mai commented on HDFS-8966:
--

The final goal is to allow the namespace and the blockmanager run 
independently. They reside in the same process but they should share no locks 
and their communications should be explicit.

The intermediate goal is to allow the NN process block report without holding 
the write lock of the FSNamesystem lock.

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8966) Separate the lock used in namespace and block management layer

2015-09-03 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729433#comment-14729433
 ] 

Daryn Sharp commented on HDFS-8966:
---

Would you please elaborate on the design vision for doing the final separation? 
 In my rare spare cycles I've been trying to do the same thing.

> Separate the lock used in namespace and block management layer
> --
>
> Key: HDFS-8966
> URL: https://issues.apache.org/jira/browse/HDFS-8966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> Currently the namespace and the block management layer share one giant lock. 
> One consequence that we have seen more and more often is that the namespace 
> hangs due to excessive activities from the block management layer. For 
> example, the NN might take a couple hundred milliseconds to handle a large 
> block report. Because the NN holds the write lock during processing the block 
> report, all namespace requests are paused. In production we have seen these 
> lock contentions cause long latencies and instabilities in the cluster.
> This umbrella jira proposes to separate the lock used by namespace and the 
> block management layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)