[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2022-12-01 Thread Yanlei Yu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642233#comment-17642233
 ] 

Yanlei Yu commented on HDFS-14657:
--

hi [~zhangchen] ,Whether the new patch can be uploaded?

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-31 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897276#comment-16897276
 ] 

Chen Zhang commented on HDFS-14657:
---

Thanks [~sodonnell] for your analysis. I'm working on a complete solution on 
the trunk code this week.
{quote}However that is probably solvable, either by making the iterator keyed, 
and reopening it after acquiring the lock (or if it throws 
concurrentModificationException) at the correct position,
{quote}
Yes, this is the same solution I used in the next patch, the new patch is 
almost complete and I'm working on performance testing now. FBR process is much 
faster on the trunk, so you are right, I'll update the default value to a 
larger one, according to the test result.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-31 Thread Stephen O'Donnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897220#comment-16897220
 ] 

Stephen O'Donnell commented on HDFS-14657:
--

I did not look back at the 2.x branch FBR processing code, but focused on the 
trunk code.

On trunk, it seems block reports are processed by walking two sorted iterators:

1. The first comes from the block report itself

2. The second is a block iterator over the storage the FBR is for.

Then the code makes used of the fact both are sorted to so a sort of merge of 
the two lists.

The problem with dropping the write lock, is therefore that this second 
iterator can be invalidated by a concurrent modification. However that is 
probably solvable, either by making the iterator keyed, and reopening it after 
acquiring the lock (or if it throws concurrentModificationException) at the 
correct position, or fast forwarding it to the correct position (would need to 
check the overhead of this).

It probably then makes sense to consider what, outside the current block report 
can alter the blocks in a storage for a volume:

1. A newly added file gets closed - this would update the storage via IBR, but 
it will be blocked by the block report lock this patch introduces, so that is 
probably not an issue.

2. A new file gets created - this is basically the same as 1.

3. The balancer or mover changing the location of blocks. These would be 
updated in the NN via more FBRs or IBRs and should not be an issue due to the 
block report processing lock.

4. A file gets deleted - A delete will immediately remove the blocks from the 
storage, but I think the FBR processing code will handle this. It checks to see 
if the block is present in the NN, and if it is not, it adds it to the 
invalidate list. The block is likely already on the list from the delete, but 
that is unlikely to be an issue either.

5. The node going dead - unlikely as it just sent a FBR.

6. Decommissioning / maintenance mode - these would impact the blocks on the 
storage only via IBRs or FBRs too.

I'm sure there are there other scenarios I have not considered. Can anyone come 
up with any more?

Aside from the above, in the latest patch, I see it will yield the write lock 
every 5000 blocks. Have you been able to do any tests to see how long it takes 
to process 5000, 50,000 and 100,000 blocks? I wonder if we would be better 
setting the default limit to release to lock at something a lot higher than 
5000, like 50k or 100k, depending on the typical processing time a batch that 
size. That would reduce the overhead of having to reopen the storage iterator 
too many times and also prevent the FBR processing from taking too long if the 
NN is under pressure with many other threads wanting the write lock.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-31 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896917#comment-16896917
 ] 

Chen Zhang commented on HDFS-14657:
---

Thanks [~shv], but sorry I can't see any problem of this change on 2.6 version.
{quote}I believe when you release the lock while iterating over the storage 
blocks, the iterator may find itself in an isolated chain of the list after 
reacquiring the lock
{quote}
It won't happen, because processReport don't iterate the storage blocks at 2.6, 
the whole FBR procedure(for each storage) can be simplified like this:

 
| # Insert a delimiter into the head of block list(triplets, it's actually a 
double linked list, so I'll ref it as the block list for simplification) of 
this storage.
 # Start a loop, iterate through block report
 ## Get a block from the report
 ## Using the block to get the stored BlockInfo object from BlockMap
 ## Check the status of the block, and add the block to corresponding 
set(toAdd, toUc, toInvalidate, toCorrupt)
 ## Move the block to the head of block list(which makes the block placed 
before delimiter)
 # Start a loop to iterate through block list, find the blocks after delimiter, 
add them to toRemove set.|

My proposal in this Jira is to release and re-acquire NN lock between 2.3 and 
2.4. This solution won't affect the correctness of block report procedure for 
the following reasons:
 # All the reported block will stored before delimiter in the end.
 # If any other thread acquire the NN lock before 2.4 add adds some new blocks, 
they will be added in the head of list.
 # If any other thread acquire the NN lock before 2.4 and removes some blocks, 
it won't affect the loop at 2nd step. (Pls notice that the delimiter can't be 
remove by other threads)
 # All the blocks after delimiter should be removed

According to the reasons described above, the following problem you mentioned 
also won't happen:
{quote}you may remove replicas that were not supposed to be removed
{quote}
 

I agree with you that the  things are tricky here, but this change is quite 
simple and I think we still can make clear the impaction.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-30 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896420#comment-16896420
 ] 

Konstantin Shvachko commented on HDFS-14657:


With branch-2 including 2.6 this change seems even more tricky, because 
{{DatanodeStorageInfo}} there uses a linked list (aka "triplets") to track 
replicas belonging to the DataNode storage. This was changed by HDFS-9260 for 
3.0, but was reported to cause serious performance issues.
For the sake of this issue I believe when you release the lock while iterating 
over the storage blocks, the iterator may find itself in an isolated chain of 
the list after reacquiring the lock. Also I don't know what happens with the 
replicas that were not reported by DN and are supposed to be deleted from the 
NameNode. With "triplets" they are collected at the end of the list when all 
reported replicas are processed. But if you re-acquire the lock the integrity 
of the list may be broken. So you may remove replicas that were not supposed to 
be removed.
I am just saying that things are tricky here. I would be surprised if you could 
navigate around these obstacles. But it would be big win if you did.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-26 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893844#comment-16893844
 ] 

Chen Zhang commented on HDFS-14657:
---

Hi [~shv], you are right, releasing NN lock in the middle of the loop will 
cause ConcurrentModificationException.

This patch is ported from our internal 2.6 branch, the implementation changed a 
lot on the trunk branch and I didn't check all the detail. I just want to 
propose this demo solution and hear people's feedback.

If the community thinks this solution is feasible, I'll try to work out a 
complete patch on the trunk branch next week, also will test it on our cluster 
and post some number of performance enhancement. 

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-25 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893228#comment-16893228
 ] 

Konstantin Shvachko commented on HDFS-14657:


Hi [~zhangchen].
Looking at your patch v2. I am not sure I understand how your approach works. 
Suppose you release {{namesystem.writeUnlock()}} in {{reportDiffSorted()}}, 
then somebody (unrelated to block report processing) can modify blocks 
belonging to the storage, which will invalidate {{storageBlocksIterator}}, 
meaning calling {{next()}} will cause {{ConcurrentModificationException}}.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-25 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892925#comment-16892925
 ] 

Chen Zhang commented on HDFS-14657:
---

Thanks [~jojochuang] for mentioning HADOOP-16452, also thanks [~hexiaoqiao] for 
your explanation.

I guess Wei-Chiu mention that Jira for referring his comments there:
{quote}I am not so worried about block reports blocking NN lock. The block 
report processing logic releases the NN lock every 4 milliseconds. 
(BlockManager.BlockReportProcessingThread#processQueue)
{quote}
IIUC, the logic of release NN lock every 4 ms only works on batch processing 
IBR, when process FBR, the thread will stuck for very long time if the FBR is 
very huge

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-25 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892889#comment-16892889
 ] 

He Xiaoqiao commented on HDFS-14657:


[~jojochuang] IIUC, they are different issues, this JIRA try to split global 
lock holding by #processReport long time to acquire lock times and process part 
of block for each lock holding.
I think it is a good thought to step towards to HDFS-11313 and with only 
namenode side changes.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-25 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892827#comment-16892827
 ] 

Wei-Chiu Chuang commented on HDFS-14657:


This is somewhat related to another Jira I filed the other day HADOOP-16452.


> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-24 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891969#comment-16891969
 ] 

Erik Krogen commented on HDFS-14657:


Hey [~zhangchen], your idea expressed in the v2 patch seems sound to me. But I 
must admit to not be deeply familiar with the block report process or what 
invariants must be upheld by the locking scheme. Hopefully folks like [~shv], 
[~jojochuang], [~kihwal] or [~daryn] can take a look as well.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889812#comment-16889812
 ] 

Hadoop QA commented on HDFS-14657:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 4 new + 555 unchanged - 1 fixed = 559 total (was 556) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}154m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | HDFS-14657 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12975349/HDFS-14657.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f66deef7d864 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 
22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / acdb0a1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27272/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-21 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889768#comment-16889768
 ] 

Chen Zhang commented on HDFS-14657:
---

upload a quick demo patch for the idea of BlockManager.blockReportLock

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-21 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889763#comment-16889763
 ] 

Chen Zhang commented on HDFS-14657:
---

Thanks [~xkrogen] for reminding me of the batch IBR processing feature.

This patch is initially applied to our 2.6 cluster which don't have batch IBR 
processing feature, so I ignored this.

Considering batch IBR processing, I suggest we can change the 
DatanodeDescriptor.reportLock to *BlockManager.blockReportLock*, and each time 
before processing a FBR or IBR, acquire the blockReportLock first, other 
behavior is not change.

I think this solution might be feasible for the following reasons:
 # FBR and IBR could only be proceed one by one, adding a blockReportLock won't 
affect this behavior
 # With blockReportLock, we can release FSNameSystem's write-lock safely
 # BlockReportProcessingThread must acquire the blockReportLock first before 
starting bach IBR processing

In additional, the original solution using DatanodeDescriptor.reportLock allows 
multiple FBR being proceed at the same time, this change may introduce some 
race condition. But with blockReportLock, the behavior is almost same as before.

 

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-19 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888936#comment-16888936
 ] 

Erik Krogen commented on HDFS-14657:


This seems like a nice middle ground between the current behavior and 
HDFS-11313, which is a larger development effort. In one of our environments, 
we ended up having to make other unpleasant hacks to get around this issue in 
lieu of HDFS-11313 being completed.

I haven't yet thought deeply about the patch, but one thing stood out to me so 
far. Within {{BlockManager#processIncrementalBlockReport}}, previously we just 
confirmed that the lock was held, now we release the lock and re-acquire it 
(twice). IIRC, currently there is behavior within the IBR processing logic to 
batch many IBRs within the same write lock acquisition, to decrease the 
overhead of locking on each IBR. So before, we had something like one lock 
acquisition per 1000 IBRs, now we have 2 lock acquisitions per IBR. I'm 
wondering if this will introduce undesirable overheads?

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-19 Thread He Xiaoqiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888568#comment-16888568
 ] 

He Xiaoqiao commented on HDFS-14657:


Thanks [~zhangchen] for filing this JIAR, it is very interesting improvement.
{quote}
1.Add a report lock to DatanodeDescriptor
2.Before processing the FBR and IBR, BlockManager should get the report lock 
for that node first
3.IBR must wait until FBR process complete, even the writelock may release and 
re-acquire many times during processing FBR
{quote}
Would you like to offer some more information about this improvement, it is 
very helpful for reviewers in my opinion.
IIUC, it changes only #blockReport processing in NameNode (rather than with 
DataNode) for only single DataNode, and hold lock per datanode and ensure no 
meta changes during process #blockReport. So I think it is under control about 
inconsistency.
[~shv], Would you like to share some furthermore suggestions?

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-17 Thread Chen Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887596#comment-16887596
 ] 

Chen Zhang commented on HDFS-14657:
---

Thanks [~shv] for your comments, I did searched for a few jiras related with 
the block-reports, but unfortunately I missed the HDFS-11313 you mentioned above

I just go though all the discussion under the jira HDFS-11313, looks the main 
concern is the race conditions between SBR and IBR, this problem is considered 
in my solution:
 # Add a report lock to DatanodeDescriptor
 # Before processing the FBR and IBR, BlockManager should get the report lock 
for that node first
 # IBR must wait until FBR process complete, even the writelock may release and 
re-acquire many times during processing FBR

This solution is quite simple and already deployed to our largest production 
cluster for more than 1 year, it's very stable.

I think SBR is a good idea and it looks more elegant, is there anyone still 
pushing that forward?

+[~daryn], you have given a lot of valuable advice for HDFS-11313, what's your 
opinion?

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR

2019-07-17 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887295#comment-16887295
 ] 

Konstantin Shvachko commented on HDFS-14657:


I was wondering if you looked at other jiras filed previously on the general 
topic "incremental block reports". One of them is HDFS-11313.
The main problem is to avoid inconsistencies, when you release the lock in the 
middle of processing a BR.

> Refine NameSystem lock usage during processing FBR
> --
>
> Key: HDFS-14657
> URL: https://issues.apache.org/jira/browse/HDFS-14657
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-14657-001.patch
>
>
> The disk with 12TB capacity is very normal today, which means the FBR size is 
> much larger than before, Namenode holds the NameSystemLock during processing 
> block report for each storage, which might take quite a long time.
> On our production environment, processing large FBR usually cause a longer 
> RPC queue time, which impacts client latency, so we did some simple work on 
> refining the lock usage, which improved the p99 latency significantly.
> In our solution, BlockManager release the NameSystem write lock and request 
> it again for every 5000 blocks(by default) during processing FBR, with the 
> fair lock, all the RPC request can be processed before BlockManager 
> re-acquire the write lock.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org