[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642233#comment-17642233 ] Yanlei Yu commented on HDFS-14657: -- hi [~zhangchen] ,Whether the new patch can be uploaded? > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897276#comment-16897276 ] Chen Zhang commented on HDFS-14657: --- Thanks [~sodonnell] for your analysis. I'm working on a complete solution on the trunk code this week. {quote}However that is probably solvable, either by making the iterator keyed, and reopening it after acquiring the lock (or if it throws concurrentModificationException) at the correct position, {quote} Yes, this is the same solution I used in the next patch, the new patch is almost complete and I'm working on performance testing now. FBR process is much faster on the trunk, so you are right, I'll update the default value to a larger one, according to the test result. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897220#comment-16897220 ] Stephen O'Donnell commented on HDFS-14657: -- I did not look back at the 2.x branch FBR processing code, but focused on the trunk code. On trunk, it seems block reports are processed by walking two sorted iterators: 1. The first comes from the block report itself 2. The second is a block iterator over the storage the FBR is for. Then the code makes used of the fact both are sorted to so a sort of merge of the two lists. The problem with dropping the write lock, is therefore that this second iterator can be invalidated by a concurrent modification. However that is probably solvable, either by making the iterator keyed, and reopening it after acquiring the lock (or if it throws concurrentModificationException) at the correct position, or fast forwarding it to the correct position (would need to check the overhead of this). It probably then makes sense to consider what, outside the current block report can alter the blocks in a storage for a volume: 1. A newly added file gets closed - this would update the storage via IBR, but it will be blocked by the block report lock this patch introduces, so that is probably not an issue. 2. A new file gets created - this is basically the same as 1. 3. The balancer or mover changing the location of blocks. These would be updated in the NN via more FBRs or IBRs and should not be an issue due to the block report processing lock. 4. A file gets deleted - A delete will immediately remove the blocks from the storage, but I think the FBR processing code will handle this. It checks to see if the block is present in the NN, and if it is not, it adds it to the invalidate list. The block is likely already on the list from the delete, but that is unlikely to be an issue either. 5. The node going dead - unlikely as it just sent a FBR. 6. Decommissioning / maintenance mode - these would impact the blocks on the storage only via IBRs or FBRs too. I'm sure there are there other scenarios I have not considered. Can anyone come up with any more? Aside from the above, in the latest patch, I see it will yield the write lock every 5000 blocks. Have you been able to do any tests to see how long it takes to process 5000, 50,000 and 100,000 blocks? I wonder if we would be better setting the default limit to release to lock at something a lot higher than 5000, like 50k or 100k, depending on the typical processing time a batch that size. That would reduce the overhead of having to reopen the storage iterator too many times and also prevent the FBR processing from taking too long if the NN is under pressure with many other threads wanting the write lock. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896917#comment-16896917 ] Chen Zhang commented on HDFS-14657: --- Thanks [~shv], but sorry I can't see any problem of this change on 2.6 version. {quote}I believe when you release the lock while iterating over the storage blocks, the iterator may find itself in an isolated chain of the list after reacquiring the lock {quote} It won't happen, because processReport don't iterate the storage blocks at 2.6, the whole FBR procedure(for each storage) can be simplified like this: | # Insert a delimiter into the head of block list(triplets, it's actually a double linked list, so I'll ref it as the block list for simplification) of this storage. # Start a loop, iterate through block report ## Get a block from the report ## Using the block to get the stored BlockInfo object from BlockMap ## Check the status of the block, and add the block to corresponding set(toAdd, toUc, toInvalidate, toCorrupt) ## Move the block to the head of block list(which makes the block placed before delimiter) # Start a loop to iterate through block list, find the blocks after delimiter, add them to toRemove set.| My proposal in this Jira is to release and re-acquire NN lock between 2.3 and 2.4. This solution won't affect the correctness of block report procedure for the following reasons: # All the reported block will stored before delimiter in the end. # If any other thread acquire the NN lock before 2.4 add adds some new blocks, they will be added in the head of list. # If any other thread acquire the NN lock before 2.4 and removes some blocks, it won't affect the loop at 2nd step. (Pls notice that the delimiter can't be remove by other threads) # All the blocks after delimiter should be removed According to the reasons described above, the following problem you mentioned also won't happen: {quote}you may remove replicas that were not supposed to be removed {quote} I agree with you that the things are tricky here, but this change is quite simple and I think we still can make clear the impaction. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896420#comment-16896420 ] Konstantin Shvachko commented on HDFS-14657: With branch-2 including 2.6 this change seems even more tricky, because {{DatanodeStorageInfo}} there uses a linked list (aka "triplets") to track replicas belonging to the DataNode storage. This was changed by HDFS-9260 for 3.0, but was reported to cause serious performance issues. For the sake of this issue I believe when you release the lock while iterating over the storage blocks, the iterator may find itself in an isolated chain of the list after reacquiring the lock. Also I don't know what happens with the replicas that were not reported by DN and are supposed to be deleted from the NameNode. With "triplets" they are collected at the end of the list when all reported replicas are processed. But if you re-acquire the lock the integrity of the list may be broken. So you may remove replicas that were not supposed to be removed. I am just saying that things are tricky here. I would be surprised if you could navigate around these obstacles. But it would be big win if you did. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893844#comment-16893844 ] Chen Zhang commented on HDFS-14657: --- Hi [~shv], you are right, releasing NN lock in the middle of the loop will cause ConcurrentModificationException. This patch is ported from our internal 2.6 branch, the implementation changed a lot on the trunk branch and I didn't check all the detail. I just want to propose this demo solution and hear people's feedback. If the community thinks this solution is feasible, I'll try to work out a complete patch on the trunk branch next week, also will test it on our cluster and post some number of performance enhancement. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893228#comment-16893228 ] Konstantin Shvachko commented on HDFS-14657: Hi [~zhangchen]. Looking at your patch v2. I am not sure I understand how your approach works. Suppose you release {{namesystem.writeUnlock()}} in {{reportDiffSorted()}}, then somebody (unrelated to block report processing) can modify blocks belonging to the storage, which will invalidate {{storageBlocksIterator}}, meaning calling {{next()}} will cause {{ConcurrentModificationException}}. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892925#comment-16892925 ] Chen Zhang commented on HDFS-14657: --- Thanks [~jojochuang] for mentioning HADOOP-16452, also thanks [~hexiaoqiao] for your explanation. I guess Wei-Chiu mention that Jira for referring his comments there: {quote}I am not so worried about block reports blocking NN lock. The block report processing logic releases the NN lock every 4 milliseconds. (BlockManager.BlockReportProcessingThread#processQueue) {quote} IIUC, the logic of release NN lock every 4 ms only works on batch processing IBR, when process FBR, the thread will stuck for very long time if the FBR is very huge > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892889#comment-16892889 ] He Xiaoqiao commented on HDFS-14657: [~jojochuang] IIUC, they are different issues, this JIRA try to split global lock holding by #processReport long time to acquire lock times and process part of block for each lock holding. I think it is a good thought to step towards to HDFS-11313 and with only namenode side changes. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892827#comment-16892827 ] Wei-Chiu Chuang commented on HDFS-14657: This is somewhat related to another Jira I filed the other day HADOOP-16452. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891969#comment-16891969 ] Erik Krogen commented on HDFS-14657: Hey [~zhangchen], your idea expressed in the v2 patch seems sound to me. But I must admit to not be deeply familiar with the block report process or what invariants must be upheld by the locking scheme. Hopefully folks like [~shv], [~jojochuang], [~kihwal] or [~daryn] can take a look as well. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889812#comment-16889812 ] Hadoop QA commented on HDFS-14657: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 555 unchanged - 1 fixed = 559 total (was 556) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | HDFS-14657 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975349/HDFS-14657.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f66deef7d864 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / acdb0a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/27272/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889768#comment-16889768 ] Chen Zhang commented on HDFS-14657: --- upload a quick demo patch for the idea of BlockManager.blockReportLock > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch, HDFS-14657.002.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889763#comment-16889763 ] Chen Zhang commented on HDFS-14657: --- Thanks [~xkrogen] for reminding me of the batch IBR processing feature. This patch is initially applied to our 2.6 cluster which don't have batch IBR processing feature, so I ignored this. Considering batch IBR processing, I suggest we can change the DatanodeDescriptor.reportLock to *BlockManager.blockReportLock*, and each time before processing a FBR or IBR, acquire the blockReportLock first, other behavior is not change. I think this solution might be feasible for the following reasons: # FBR and IBR could only be proceed one by one, adding a blockReportLock won't affect this behavior # With blockReportLock, we can release FSNameSystem's write-lock safely # BlockReportProcessingThread must acquire the blockReportLock first before starting bach IBR processing In additional, the original solution using DatanodeDescriptor.reportLock allows multiple FBR being proceed at the same time, this change may introduce some race condition. But with blockReportLock, the behavior is almost same as before. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888936#comment-16888936 ] Erik Krogen commented on HDFS-14657: This seems like a nice middle ground between the current behavior and HDFS-11313, which is a larger development effort. In one of our environments, we ended up having to make other unpleasant hacks to get around this issue in lieu of HDFS-11313 being completed. I haven't yet thought deeply about the patch, but one thing stood out to me so far. Within {{BlockManager#processIncrementalBlockReport}}, previously we just confirmed that the lock was held, now we release the lock and re-acquire it (twice). IIRC, currently there is behavior within the IBR processing logic to batch many IBRs within the same write lock acquisition, to decrease the overhead of locking on each IBR. So before, we had something like one lock acquisition per 1000 IBRs, now we have 2 lock acquisitions per IBR. I'm wondering if this will introduce undesirable overheads? > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888568#comment-16888568 ] He Xiaoqiao commented on HDFS-14657: Thanks [~zhangchen] for filing this JIAR, it is very interesting improvement. {quote} 1.Add a report lock to DatanodeDescriptor 2.Before processing the FBR and IBR, BlockManager should get the report lock for that node first 3.IBR must wait until FBR process complete, even the writelock may release and re-acquire many times during processing FBR {quote} Would you like to offer some more information about this improvement, it is very helpful for reviewers in my opinion. IIUC, it changes only #blockReport processing in NameNode (rather than with DataNode) for only single DataNode, and hold lock per datanode and ensure no meta changes during process #blockReport. So I think it is under control about inconsistency. [~shv], Would you like to share some furthermore suggestions? > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887596#comment-16887596 ] Chen Zhang commented on HDFS-14657: --- Thanks [~shv] for your comments, I did searched for a few jiras related with the block-reports, but unfortunately I missed the HDFS-11313 you mentioned above I just go though all the discussion under the jira HDFS-11313, looks the main concern is the race conditions between SBR and IBR, this problem is considered in my solution: # Add a report lock to DatanodeDescriptor # Before processing the FBR and IBR, BlockManager should get the report lock for that node first # IBR must wait until FBR process complete, even the writelock may release and re-acquire many times during processing FBR This solution is quite simple and already deployed to our largest production cluster for more than 1 year, it's very stable. I think SBR is a good idea and it looks more elegant, is there anyone still pushing that forward? +[~daryn], you have given a lot of valuable advice for HDFS-11313, what's your opinion? > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14657) Refine NameSystem lock usage during processing FBR
[ https://issues.apache.org/jira/browse/HDFS-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887295#comment-16887295 ] Konstantin Shvachko commented on HDFS-14657: I was wondering if you looked at other jiras filed previously on the general topic "incremental block reports". One of them is HDFS-11313. The main problem is to avoid inconsistencies, when you release the lock in the middle of processing a BR. > Refine NameSystem lock usage during processing FBR > -- > > Key: HDFS-14657 > URL: https://issues.apache.org/jira/browse/HDFS-14657 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14657-001.patch > > > The disk with 12TB capacity is very normal today, which means the FBR size is > much larger than before, Namenode holds the NameSystemLock during processing > block report for each storage, which might take quite a long time. > On our production environment, processing large FBR usually cause a longer > RPC queue time, which impacts client latency, so we did some simple work on > refining the lock usage, which improved the p99 latency significantly. > In our solution, BlockManager release the NameSystem write lock and request > it again for every 5000 blocks(by default) during processing FBR, with the > fair lock, all the RPC request can be processed before BlockManager > re-acquire the write lock. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org