[jira] [Commented] (HDFS-15584) Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom.
[ https://issues.apache.org/jira/browse/HDFS-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198220#comment-17198220 ] zhuqi commented on HDFS-15584: -- Hi [~sodonnell] Yeah, i agree with you that we should sleep. And the good default for "dfs.namenode.block.deletion.lock.time.threshold", we should test how much it should be. If the sleep time is 1ms, the "dfs.namenode.block.deletion.lock.time.threshold" may should be tens of ms, but i set the it 100ms for init. Also i add the threshold is to avoid sleep when not heavy deletion, to reduce the time it holds the lock. You are right. When pending deletion too many, the datanode will be heavy to deal with too many deletion. Also the speed namenode put into will be slower, which affect the performance. Thanks for your quickly reply. > Improve HDFS large deletion cause namenode lockqueue boom and pending > deletion boom. > > > Key: HDFS-15584 > URL: https://issues.apache.org/jira/browse/HDFS-15584 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15584.001.patch > > > In our production cluster, the large deletion will boom the namenode lock > queue, also will lead to the boom of pending deletion in invalidate blocks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15584) Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom.
[ https://issues.apache.org/jira/browse/HDFS-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198198#comment-17198198 ] zhuqi commented on HDFS-15584: -- Hi [~LiJinglun], in our very busy cluster with thousands of nodes, very heavy deletion everyday causes the lock queue full for a couple of minutes. Also, when millions of blocks are put into the pending deletion queue, NameNode will suffer from a big performance drop. When the above situation happens, the original block increment solution also can not solve the problem in our cluster, so i add the patch to try to solve it. Thanks. > Improve HDFS large deletion cause namenode lockqueue boom and pending > deletion boom. > > > Key: HDFS-15584 > URL: https://issues.apache.org/jira/browse/HDFS-15584 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15584.001.patch > > > In our production cluster, the large deletion will boom the namenode lock > queue, also will lead to the boom of pending deletion in invalidate blocks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15584) Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom.
[ https://issues.apache.org/jira/browse/HDFS-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198093#comment-17198093 ] zhuqi commented on HDFS-15584: -- cc [~sodonnell] , [~hexiaoqiao] I add the draft patch without unit test. I add the wait lock time, also i add the threshold to control the wait lock interval. If the lock time of block deletion exceed the threshold of the total lock time, we will wait the lock. All two we can configure according cluster situation. If you any advice , thanks. > Improve HDFS large deletion cause namenode lockqueue boom and pending > deletion boom. > > > Key: HDFS-15584 > URL: https://issues.apache.org/jira/browse/HDFS-15584 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15584.001.patch > > > In our production cluster, the large deletion will boom the namenode lock > queue, also will lead to the boom of pending deletion in invalidate blocks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15584) Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom.
[ https://issues.apache.org/jira/browse/HDFS-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15584: - Attachment: HDFS-15584.001.patch Status: Patch Available (was: Open) > Improve HDFS large deletion cause namenode lockqueue boom and pending > deletion boom. > > > Key: HDFS-15584 > URL: https://issues.apache.org/jira/browse/HDFS-15584 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15584.001.patch > > > In our production cluster, the large deletion will boom the namenode lock > queue, also will lead to the boom of pending deletion in invalidate blocks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15584) Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom.
zhuqi created HDFS-15584: Summary: Improve HDFS large deletion cause namenode lockqueue boom and pending deletion boom. Key: HDFS-15584 URL: https://issues.apache.org/jira/browse/HDFS-15584 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.4.0 Reporter: zhuqi Assignee: zhuqi In our production cluster, the large deletion will boom the namenode lock queue, also will lead to the boom of pending deletion in invalidate blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102302#comment-17102302 ] zhuqi commented on HDFS-15160: -- Hi [~weichiu] The wrong case seems not related to this patch, i apply to the 005 in our cluster. And the cluster works well now. Thanks. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081128#comment-17081128 ] zhuqi commented on HDFS-15160: -- cc [~weichiu] Thanks for your reply. The 003 i apply , i will change to 005 to see if the problem will be solved. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080371#comment-17080371 ] zhuqi commented on HDFS-15160: -- cc [~sodonnell] If the race condition cause the RBW get genstamp not consistent. There are some cases in our production cluster. !image-2020-04-10-17-18-08-128.png|width=860,height=226! !image-2020-04-10-17-18-55-938.png|width=1144,height=157! > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15160: - Attachment: image-2020-04-10-17-18-55-938.png > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15160: - Attachment: image-2020-04-10-17-18-08-128.png > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, > image-2020-04-10-17-18-08-128.png > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078878#comment-17078878 ] zhuqi commented on HDFS-15180: -- cc [~Aiphag0] The Block in org.apache.hadoop.hdfs.protocol GenerationStamp and bytes action should change to synchronized. There may some cases that when holding the read lock, but we update the Block GenerationStamp at the same time. !image-2020-04-09-11-20-36-459.png! > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, > HDFS-15180.003.patch, HDFS-15180.004.patch, > image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, > image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15180: - Attachment: image-2020-04-09-11-20-36-459.png > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, > HDFS-15180.003.patch, HDFS-15180.004.patch, > image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, > image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14524) NNTop total counts does not add up as expected
[ https://issues.apache.org/jira/browse/HDFS-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi reassigned HDFS-14524: Assignee: (was: zhuqi) > NNTop total counts does not add up as expected > -- > > Key: HDFS-14524 > URL: https://issues.apache.org/jira/browse/HDFS-14524 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ahmed Hussein >Priority: Minor > Attachments: HDFS-14524.001.patch > > > {{opType='*'}} is sometimes smaller than the sum of the individual operation > types. > {code:java} > { > "windows": [ > { > "windowLenMs": 30, > "ops": [ > { > "totalCount": 24158, > "opType": "rpc.complete", > "topUsers": [{ "count": 2944, "user": "user1" }] > }, > { > "totalCount": 15921, > "opType": "rpc.rename", > "topUsers": [{ "count": 2891, "user": "user1" }] > }, > { > "totalCount": 3015834, > "opType": "*", > "topUsers": [{ "count": 66652, "user": "user1" }] > }, > { > "totalCount": 2086, > "opType": "rpc.abandonBlock", > "topUsers": [{ "count": 603, "user": "user1" }] > }, > { > "totalCount": 30258, > "opType": "rpc.addBlock", > "topUsers": [{ "count": 3182, "user": "user1" }] > }, > { > "totalCount": 101440, > "opType": "rpc.getServerDefaults", > "topUsers": [{ "count": 3521, "user": "user1" }] > }, > { > "totalCount": 25258, > "opType": "rpc.create", > "topUsers": [{ "count": 1864, "user": "user1" }] > }, > { > "totalCount": 1377563, > "opType": "rpc.getFileInfo", > "topUsers": [{ "count": 56541, "user": "user1" }] > }, > { > "totalCount": 60836, > "opType": "rpc.renewLease", > "topUsers": [{ "count": 3783, "user": "user1" }] > }, > { > "totalCount": 182212, > "opType": "rpc.getListing", > "topUsers": [{ "count": 1848, "user": "user1" }] > }, > { > "totalCount": 380, > "opType": "rpc.updateBlockForPipeline", > "topUsers": [{ "count": 58, "user": "user1" }] > }, > { > "totalCount": 215, > "opType": "rpc.updatePipeline", > "topUsers": [{ "count": 18, "user": "user1" }] > } > ] > } > ], > "timestamp": "2019-01-12" > } > {code} > > {{opType='*'}} from user {{user1}} is {{66652}}, but the sum of counts for > other {{optype}} values by {{user1}} is actually larger: {{77253}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14524) NNTop total counts does not add up as expected
[ https://issues.apache.org/jira/browse/HDFS-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi reassigned HDFS-14524: Assignee: zhuqi (was: Ahmed Hussein) > NNTop total counts does not add up as expected > -- > > Key: HDFS-14524 > URL: https://issues.apache.org/jira/browse/HDFS-14524 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: zhuqi >Priority: Minor > Attachments: HDFS-14524.001.patch > > > {{opType='*'}} is sometimes smaller than the sum of the individual operation > types. > {code:java} > { > "windows": [ > { > "windowLenMs": 30, > "ops": [ > { > "totalCount": 24158, > "opType": "rpc.complete", > "topUsers": [{ "count": 2944, "user": "user1" }] > }, > { > "totalCount": 15921, > "opType": "rpc.rename", > "topUsers": [{ "count": 2891, "user": "user1" }] > }, > { > "totalCount": 3015834, > "opType": "*", > "topUsers": [{ "count": 66652, "user": "user1" }] > }, > { > "totalCount": 2086, > "opType": "rpc.abandonBlock", > "topUsers": [{ "count": 603, "user": "user1" }] > }, > { > "totalCount": 30258, > "opType": "rpc.addBlock", > "topUsers": [{ "count": 3182, "user": "user1" }] > }, > { > "totalCount": 101440, > "opType": "rpc.getServerDefaults", > "topUsers": [{ "count": 3521, "user": "user1" }] > }, > { > "totalCount": 25258, > "opType": "rpc.create", > "topUsers": [{ "count": 1864, "user": "user1" }] > }, > { > "totalCount": 1377563, > "opType": "rpc.getFileInfo", > "topUsers": [{ "count": 56541, "user": "user1" }] > }, > { > "totalCount": 60836, > "opType": "rpc.renewLease", > "topUsers": [{ "count": 3783, "user": "user1" }] > }, > { > "totalCount": 182212, > "opType": "rpc.getListing", > "topUsers": [{ "count": 1848, "user": "user1" }] > }, > { > "totalCount": 380, > "opType": "rpc.updateBlockForPipeline", > "topUsers": [{ "count": 58, "user": "user1" }] > }, > { > "totalCount": 215, > "opType": "rpc.updatePipeline", > "topUsers": [{ "count": 18, "user": "user1" }] > } > ] > } > ], > "timestamp": "2019-01-12" > } > {code} > > {{opType='*'}} from user {{user1}} is {{66652}}, but the sum of counts for > other {{optype}} values by {{user1}} is actually larger: {{77253}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060700#comment-17060700 ] zhuqi commented on HDFS-15180: -- Thanks [~Aiphag0] for your works and the POC patch. CC [~hexiaoqiao] [~Aiphag0] LGTM the POC patch. There are some suggestions that : First , we'd better to use Lock to implement AutoCloseableLock, so that consistent with the new read write lock in DataNode and can use try() without finally{} consisely. Second, the get replica information in DataNode#transferReplicaForPipelineRecovery should change to read lock. And i am looking forward to the volume level lock, and remove remain IO lock. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15180.001.patch, image-2020-03-10-17-22-57-391.png, > image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056054#comment-17056054 ] zhuqi edited comment on HDFS-15160 at 3/10/20, 3:23 PM: Thanks [~sodonnell] for your great works. LGTM, i agree with [~hexiaoqiao] that the DataNode#transferReplicaForPipelineRecovery should change data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica information. was (Author: zhuqi): Thanks [~sodonnell] for your great works. LGTM, i agree with [~hexiaoqiao] that the DataNode#transferReplicaForPipelineRecovery should change data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica information. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056054#comment-17056054 ] zhuqi commented on HDFS-15160: -- Thanks [~sodonnell] for your great works. LGTM, i agree with [~hexiaoqiao] that the DataNode#transferReplicaForPipelineRecovery should change data.acquireDatasetLock() to data.acquireDatasetReadLock() to get replica information. > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056011#comment-17056011 ] zhuqi commented on HDFS-15180: -- Hi [~sodonnell] Yeah, your comment is just my mean. If we need add the top lock time information to the datanode metrics , so that may help to make the future performance improvement decision. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png, > image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055739#comment-17055739 ] zhuqi edited comment on HDFS-15180 at 3/10/20, 9:44 AM: !image-2020-03-10-17-22-57-391.png|width=604,height=137! Hi [~sodonnell] It's the thread blocked avg per day monitor. I just gray 3 nodes in busy cluster in different time , the green one is the longest, and there are almost no blocked thread now. The two one has a good improvement from it has beed changed to RW Lock. And i analyze the log to find the long lock time happen when : !image-2020-03-10-17-31-58-830.png|width=611,height=148! DirectoryScanner scan operation. And other does not cause too much time when very busy: !image-2020-03-10-17-34-26-368.png|width=812,height=136! Such as the deep copy for the caculation of dfsUsed. was (Author: zhuqi): !image-2020-03-10-17-22-57-391.png|width=604,height=137! Hi [~sodonnell] I just gray 3 nodes in busy cluster in different time , the green one is the longest, and there are almost no blocked thread now. The two one has a good improvement from it has beed changed to RW Lock. And i analyze the log to find the long lock time happen when : !image-2020-03-10-17-31-58-830.png|width=611,height=148! DirectoryScanner scan operation. And other does not cause too much time when very busy: !image-2020-03-10-17-34-26-368.png|width=812,height=136! Such as the deep copy for the caculation of dfsUsed. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png, > image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055747#comment-17055747 ] zhuqi commented on HDFS-15180: -- CC [~sodonnell] And my version is based cdh 5.16.1, and my lock is fair lock. What is the key performance factor about the RW in DataNode you want to know, i can try to confirm the RW improvement when i next to gray more nodes in our busy cluster. May be we need more metrics to help the performance improvement confirm, what do you think about it? Thanks a lot. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png, > image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055739#comment-17055739 ] zhuqi edited comment on HDFS-15180 at 3/10/20, 9:35 AM: !image-2020-03-10-17-22-57-391.png|width=604,height=137! Hi [~sodonnell] I just gray 3 nodes in busy cluster in different time , the green one is the longest, and there are almost no blocked thread now. The two one has a good improvement from it has beed changed to RW Lock. And i analyze the log to find the long lock time happen when : !image-2020-03-10-17-31-58-830.png|width=611,height=148! DirectoryScanner scan operation. And other does not cause too much time when very busy: !image-2020-03-10-17-34-26-368.png|width=812,height=136! Such as the deep copy for the caculation of dfsUsed. was (Author: zhuqi): !image-2020-03-10-17-22-57-391.png! > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png, > image-2020-03-10-17-31-58-830.png, image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055739#comment-17055739 ] zhuqi commented on HDFS-15180: -- !image-2020-03-10-17-22-57-391.png! > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15180: - Attachment: image-2020-03-10-17-22-57-391.png > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-03-10-17-22-57-391.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055671#comment-17055671 ] zhuqi commented on HDFS-15180: -- [~sodonnell] , [~hexiaoqiao] Thanks for your patient reply. [~sodonnell] have done some work, ref. HDFS-15150 introduce read write lock and HDFS-15160 is in progress currently. I have used HDFS-15160 in our product cluster to gray, and now the blocked thread in datanode has been reduced a lot.-- [~hexiaoqiao] i am looking forward to the {{BlockPoolLockManager}} to split {{dataLock}} more fine-grained, i can assign to [~Aiphag0] anytime if he wants to take it. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042798#comment-17042798 ] zhuqi commented on HDFS-15041: -- cc [~ayushtkn] Thanks for your review. I have fixed it. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch, HDFS-15041.004.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Attachment: HDFS-15041.004.patch Status: Patch Available (was: In Progress) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch, HDFS-15041.004.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Status: In Progress (was: Patch Available) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch, HDFS-15041.004.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042783#comment-17042783 ] zhuqi commented on HDFS-15041: -- cc [~ayushtkn] [~weichiu] I have changed the configuration to support time units. If any other change for merging it. Thanks. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Attachment: HDFS-15041.003.patch Status: Patch Available (was: In Progress) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch, > HDFS-15041.003.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Status: In Progress (was: Patch Available) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041574#comment-17041574 ] zhuqi commented on HDFS-15171: -- Hi [~weichiu] There are no cache file if the datanode shutdow ungracefully , change the dfs.datanode.cached-dfsused.check.interval.ms will not help my case. The HDFS-14313 should can reduce the refresh time, i will try it. Thanks. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041574#comment-17041574 ] zhuqi edited comment on HDFS-15171 at 2/21/20 6:08 AM: --- Hi [~weichiu] There are no cache file if the datanode shutdown ungracefully , change the dfs.datanode.cached-dfsused.check.interval.ms will not help my case. The HDFS-14313 should can reduce the refresh time, i will try it. Thanks. was (Author: zhuqi): Hi [~weichiu] There are no cache file if the datanode shutdow ungracefully , change the dfs.datanode.cached-dfsused.check.interval.ms will not help my case. The HDFS-14313 should can reduce the refresh time, i will try it. Thanks. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041564#comment-17041564 ] zhuqi commented on HDFS-15171: -- Hi [~sodonnell] Thanks for your patient reply. First, the every 10 minutes thread in CachingGetSpaceUsed, now with a random jitter time to random the refresh operation, and if we can persist the value to the cache file when the value refresh, this is the most real time cache. Second, when the value refresh, we can compare it with last one, if they are same, we can jump the persist operation to reduce the disk operation. In order to reduce the disk operation, we can add a fixed time interval which can be configurated, when the real time fresh time exceed the fixed time interval , then to persist the value to disk. Then we can remove the shutdown hook persist operation and don't need to caculate what dfs.datanode.cached-dfsused.check.interval.ms is suitable anymore. And also can reslove my problem, which caused by the datanode shutdown ungracefully. What do you think about my advice? > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17041510#comment-17041510 ] zhuqi commented on HDFS-15177: -- cc [~sodonnell] Thanks your patient reply. I will change to fair. > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png, > image-2020-02-18-22-55-38-661.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040816#comment-17040816 ] zhuqi commented on HDFS-15177: -- Hi [~sodonnell] Thanks for your reply. I will monitor the FoldedTreeSet problem such as HDFS-15131. And you said on the 3.x branch, the locking in the DN has been changed to a fair lock for some time now, but i find the AutoCloseableLock uses the ReentrantLock and it default uses NonfairSync , and when will the DN uses the fair lock? > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png, > image-2020-02-18-22-55-38-661.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15180: - Component/s: datanode > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039723#comment-17039723 ] zhuqi commented on HDFS-15177: -- Hi [~sodonnell] Thanks for your reply. Next, i will change our cluster to a fair lock which has been changed in 3.x branch, then to see if the blocked thread problem will be improved. I support you that the FoldedTreeSet should be improved to get better performance, and i will monitor the namenode stack when the datanode become slower next time, to see if the FoldedTreeSet problem happen. I am excited to just see the [HDFS-15150|https://issues.apache.org/jira/browse/HDFS-15150] and the[HDFS-15160|https://issues.apache.org/jira/browse/HDFS-15160] , it is a good news for the improvement for the concurrency and throughput of the lock, and it is a start for a lock per block pool proposal. > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png, > image-2020-02-18-22-55-38-661.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039167#comment-17039167 ] zhuqi commented on HDFS-15180: -- cc [~sodonnell] , [~linyiqun], [~weichiu] , [~hexiaoqiao] What do you think about it. Can you give some advice ? Thanks. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15180: - Description: Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. (was: Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in big cluster, we can split the FsDatasetImpl datasetLock via blockpool. ) > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039165#comment-17039165 ] zhuqi #ommented on HDFS-15177: -- cc [~sodonnell] , Not the trunk version, my version is hadoop2.6.0-cdh5.16.1. And when the pending deletion in monitor booms , the synchronized FsDatasetImpl heavy problem become more obvious. > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png, > image-2020-02-18-22-55-38-661.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
zhuqi created HDFS-15180: Summary: DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. Key: HDFS-15180 URL: https://issues.apache.org/jira/browse/HDFS-15180 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.2.0 Reporter: zhuqi Assignee: zhuqi Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in big cluster, we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039146#comment-17039146 ] zhuqi edited comment on HDFS-15177 at 2/18/20 2:56 PM: --- cc [~sodonnell] , Thanks for your reply. Our cluster' deletion is so heavy with many namespaces and with about 30 storage dirs on one datanode. All namespace will call the same FsDatasetImpl function. I am on 2.x branch, cdh5.16.1 version, I mean BPOfferService receive too many deletion blocks waiting to be delete. And cause too many items in Block invalidBlks[], so that the synchronized FsDatasetImpl including the volumeMap removed will iterator so many times, even blocked the heartbeat which uses the synchronized FsDatasetImpl and when we change to remove the heartbeat synchronized FsDatasetImpl the heartbeat recover normal according to HDFS-7060 , but heavy deletion will still block the synchronized FsDatasetImpl some times which will affect other action about synchronized FsDatasetImpl. The blocked stack example is : !image-2020-02-18-22-39-00-642.png|width=843,height=115! Also affect the read and write. !image-2020-02-18-22-55-38-661.png|width=960,height=122! !image-2020-02-18-22-51-28-624.png|width=891,height=120! !image-2020-02-18-22-52-59-202.png|width=996,height=120! was (Author: zhuqi): cc [~sodonnell] , Thanks for your reply. Our cluster' deletion is so heavy with many namespaces and with about 30 storage dirs on one datanode. All namespace will call the same FsDatasetImpl function. I am on 2.x branch, cdh5.16.1 version, I mean BPOfferService receive too many deletion blocks waiting to be delete. And cause too many items in Block invalidBlks[], so that the synchronized FsDatasetImpl including the volumeMap removed will iterator so many times, even blocked the heartbeat which uses the synchronized FsDatasetImpl and when we change to remove the heartbeat synchronized FsDatasetImpl the heartbeat recover normal according to [HDFS-7060|https://issues.apache.org/jira/browse/HDFS-7060] , but heavy deletion will still block the synchronized FsDatasetImpl some times which will affect other action about synchronized FsDatasetImpl. The blocked stack example is : !image-2020-02-18-22-39-00-642.png|width=843,height=115! Also affect the read and write. !image-2020-02-18-22-51-28-624.png|width=891,height=120! !image-2020-02-18-22-52-59-202.png|width=996,height=120! > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png, > image-2020-02-18-22-55-38-661.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039146#comment-17039146 ] zhuqi commented on HDFS-15177: -- cc [~sodonnell] , Thanks for your reply. Our cluster' deletion is so heavy with many namespaces and with about 30 storage dirs on one datanode. All namespace will call the same FsDatasetImpl function. I am on 2.x branch, cdh5.16.1 version, I mean BPOfferService receive too many deletion blocks waiting to be delete. And cause too many items in Block invalidBlks[], so that the synchronized FsDatasetImpl including the volumeMap removed will iterator so many times, even blocked the heartbeat which uses the synchronized FsDatasetImpl and when we change to remove the heartbeat synchronized FsDatasetImpl the heartbeat recover normal according to [HDFS-7060|https://issues.apache.org/jira/browse/HDFS-7060] , but heavy deletion will still block the synchronized FsDatasetImpl some times which will affect other action about synchronized FsDatasetImpl. The blocked stack example is : !image-2020-02-18-22-39-00-642.png|width=843,height=115! Also affect the read and write. !image-2020-02-18-22-51-28-624.png|width=891,height=120! !image-2020-02-18-22-52-59-202.png|width=996,height=120! > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Attachment: image-2020-02-18-22-52-59-202.png > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png, image-2020-02-18-22-52-59-202.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Attachment: image-2020-02-18-22-51-28-624.png > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png, > image-2020-02-18-22-51-28-624.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Attachment: image-2020-02-18-22-39-00-642.png > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: image-2020-02-18-22-39-00-642.png > > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Component/s: datanode > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Summary: Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too much time. (was: Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.) > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > much time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Description: In our cluster, the datanode receive the delete command with too many blocks deletion when we have many blockpools sharing the same datanode and the datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too much time. was: In our cluster, the datanode receive the delete command with too many blocks deletion when we have many blockpools sharing the same datanode, it will cause the FsDatasetImpl lock too much time. > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > many time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode and the > datanode with about 30 storage dirs, it will cause the FsDatasetImpl lock too > much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.
[ https://issues.apache.org/jira/browse/HDFS-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15177: - Description: In our cluster, the datanode receive the delete command with too many blocks deletion when we have many blockpools sharing the same datanode, it will cause the FsDatasetImpl lock too much time. was: In our cluster , the datanode receive the delete command with too many blocks deletion when we have many blockpools sharing the same datanode, it will cause the FsDatasetImpl lock too much time. > Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too > many time. > -- > > Key: HDFS-15177 > URL: https://issues.apache.org/jira/browse/HDFS-15177 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > In our cluster, the datanode receive the delete command with too many blocks > deletion when we have many blockpools sharing the same datanode, it will > cause the FsDatasetImpl lock too much time. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15177) Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time.
zhuqi created HDFS-15177: Summary: Split datanode invalide block deletion, to avoid the FsDatasetImpl lock too many time. Key: HDFS-15177 URL: https://issues.apache.org/jira/browse/HDFS-15177 Project: Hadoop HDFS Issue Type: Improvement Reporter: zhuqi Assignee: zhuqi In our cluster , the datanode receive the delete command with too many blocks deletion when we have many blockpools sharing the same datanode, it will cause the FsDatasetImpl lock too much time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17038090#comment-17038090 ] zhuqi commented on HDFS-15171: -- cc [~linyiqun], [~weichiu] , [~hexiaoqiao] What do you think about this problem? If you can give me some advice. Thanks. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
[ https://issues.apache.org/jira/browse/HDFS-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15171: - Description: There are 30 storage dirs per datanode in our production cluster , it will take too many time to restart, because sometimes the datanode didn't shutdown gracefully. Now only the datanode graceful shut down hook and the blockpoolslice shutdown will cause the saveDfsUsed function, that cause the restart of datanode can't reuse the dfsuse cache sometimes. I think if we can add a thread to periodically call the saveDfsUsed function. was: There are 30 storage dirs in our production cluster , it will take too many time to restart, because sometimes the datanode didn't shutdown gracefully. Now only the datanode graceful shut down hook and the blockpoolslice shutdown will cause the saveDfsUsed function, that cause the restart of datanode can't reuse the dfsuse cache sometimes. I think if we can add a thread to periodically call the saveDfsUsed function. > Add a thread to call saveDfsUsed periodically, to prevent datanode too long > restart time. > --- > > Key: HDFS-15171 > URL: https://issues.apache.org/jira/browse/HDFS-15171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > > There are 30 storage dirs per datanode in our production cluster , it will > take too many time to restart, because sometimes the datanode didn't shutdown > gracefully. Now only the datanode graceful shut down hook and the > blockpoolslice shutdown will cause the saveDfsUsed function, that cause the > restart of datanode can't reuse the dfsuse cache sometimes. I think if we can > add a thread to periodically call the saveDfsUsed function. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15171) Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time.
zhuqi created HDFS-15171: Summary: Add a thread to call saveDfsUsed periodically, to prevent datanode too long restart time. Key: HDFS-15171 URL: https://issues.apache.org/jira/browse/HDFS-15171 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.2.0 Reporter: zhuqi Assignee: zhuqi There are 30 storage dirs in our production cluster , it will take too many time to restart, because sometimes the datanode didn't shutdown gracefully. Now only the datanode graceful shut down hook and the blockpoolslice shutdown will cause the saveDfsUsed function, that cause the restart of datanode can't reuse the dfsuse cache sometimes. I think if we can add a thread to periodically call the saveDfsUsed function. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15083) Add new trash rpc which move the trash (mkdir and the rename) operation to the server side.
[ https://issues.apache.org/jira/browse/HDFS-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012433#comment-17012433 ] zhuqi commented on HDFS-15083: -- cc [~weichiu] Thanks for your comment, sorry for my draft patch, the cloud storage should be supported, i think so. I just change the TrashPolicyDefault in order to support the DistributedFileSystem trash in the server side quickly for our cluster need, for our Router trash need in HDFS-14117 , i think the trash in server side is graceful compare the HDFS-14117 , and also can reduce the trash rpc to 50%, because of that our hdfs life time system's trash action will lead to heavy load to namenode. If you any advice to push the graceful trash and reduce the trash rpc ? > Add new trash rpc which move the trash (mkdir and the rename) operation to > the server side. > --- > > Key: HDFS-15083 > URL: https://issues.apache.org/jira/browse/HDFS-15083 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, namenode, rbf >Affects Versions: 2.10.0, 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15083.001.patch > > > Now the rbf trash with multi cluster mounted in > [HDFS-14117|https://issues.apache.org/jira/browse/HDFS-14117] , the solution > is not graceful。 > If we can move the client side trash (mkdir and rename) to the server side, > we can not only solve the problem gracefully, but also reduce the trash rpc > load in server side to about %50 compare to the origin trash which call two > times rpc(mkdir and rename). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15109) RBF: Plugin interface to enable delegation of Router
zhuqi created HDFS-15109: Summary: RBF: Plugin interface to enable delegation of Router Key: HDFS-15109 URL: https://issues.apache.org/jira/browse/HDFS-15109 Project: Hadoop HDFS Issue Type: Sub-task Reporter: zhuqi If we can support plugin interface in router side, may be we can Implement permission control and other important need in router side, and the control is Independent of the namenode side default control. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15083) Add new trash rpc which move the trash (mkdir and the rename) operation to the server side.
[ https://issues.apache.org/jira/browse/HDFS-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008334#comment-17008334 ] zhuqi commented on HDFS-15083: -- Thanks [~hexiaoqiao] for your reply, cc [~ayushtkn],[~ramkumar] : I just use the DistributedFileSystem and the RawFileSystem for the draft function which i need for our cluster, and there are some backward compatibility issue we should solve. We can discuss and push forward it, and then i can separate the RBF trash when i am free. > Add new trash rpc which move the trash (mkdir and the rename) operation to > the server side. > --- > > Key: HDFS-15083 > URL: https://issues.apache.org/jira/browse/HDFS-15083 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, namenode, rbf >Affects Versions: 2.10.0, 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15083.001.patch > > > Now the rbf trash with multi cluster mounted in > [HDFS-14117|https://issues.apache.org/jira/browse/HDFS-14117] , the solution > is not graceful。 > If we can move the client side trash (mkdir and rename) to the server side, > we can not only solve the problem gracefully, but also reduce the trash rpc > load in server side to about %50 compare to the origin trash which call two > times rpc(mkdir and rename). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15083) Add new trash rpc which move the trash (mkdir and the rename) operation to the server side.
[ https://issues.apache.org/jira/browse/HDFS-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008072#comment-17008072 ] zhuqi commented on HDFS-15083: -- cc [~elgoiri] , [~weichiu] , [~hexiaoqiao] : Now , i push a draft patch without checking style, which i have used in our cluster. It may be used for the graceful trash function and reduce the trash namenode RPC load to 50%. I also include the router side draft code. Sorry for my so draft code. I can improve it when i am free. > Add new trash rpc which move the trash (mkdir and the rename) operation to > the server side. > --- > > Key: HDFS-15083 > URL: https://issues.apache.org/jira/browse/HDFS-15083 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, namenode, rbf >Affects Versions: 2.10.0, 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15083.001.patch > > > Now the rbf trash with multi cluster mounted in > [HDFS-14117|https://issues.apache.org/jira/browse/HDFS-14117] , the solution > is not graceful。 > If we can move the client side trash (mkdir and rename) to the server side, > we can not only solve the problem gracefully, but also reduce the trash rpc > load in server side to about %50 compare to the origin trash which call two > times rpc(mkdir and rename). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15083) Add new trash rpc which move the trash (mkdir and the rename) operation to the server side.
[ https://issues.apache.org/jira/browse/HDFS-15083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15083: - Attachment: HDFS-15083.001.patch Status: Patch Available (was: Open) > Add new trash rpc which move the trash (mkdir and the rename) operation to > the server side. > --- > > Key: HDFS-15083 > URL: https://issues.apache.org/jira/browse/HDFS-15083 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, namenode, rbf >Affects Versions: 3.2.0, 2.10.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15083.001.patch > > > Now the rbf trash with multi cluster mounted in > [HDFS-14117|https://issues.apache.org/jira/browse/HDFS-14117] , the solution > is not graceful。 > If we can move the client side trash (mkdir and rename) to the server side, > we can not only solve the problem gracefully, but also reduce the trash rpc > load in server side to about %50 compare to the origin trash which call two > times rpc(mkdir and rename). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15083) Add new trash rpc which move the trash (mkdir and the rename) operation to the server side.
zhuqi created HDFS-15083: Summary: Add new trash rpc which move the trash (mkdir and the rename) operation to the server side. Key: HDFS-15083 URL: https://issues.apache.org/jira/browse/HDFS-15083 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient, namenode, rbf Affects Versions: 3.2.0, 2.10.0 Reporter: zhuqi Assignee: zhuqi Now the rbf trash with multi cluster mounted in [HDFS-14117|https://issues.apache.org/jira/browse/HDFS-14117] , the solution is not graceful。 If we can move the client side trash (mkdir and rename) to the server side, we can not only solve the problem gracefully, but also reduce the trash rpc load in server side to about %50 compare to the origin trash which call two times rpc(mkdir and rename). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Comment: was deleted (was: Thanks for [~hexiaoqiao] to help to cc [~weichiu]. Now i am the Hadoop YARN Contributor, could you help me to add to Hadoop HDFS Contributor. It's my honor to contribute to Hadoop HDFS.) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Attachment: HDFS-15041.002.patch Release Note: fix checkstyle Status: Patch Available (was: Open) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch, HDFS-15041.002.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Status: Open (was: Patch Available) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992261#comment-16992261 ] zhuqi commented on HDFS-15041: -- Thanks for [~hexiaoqiao] to help to cc [~weichiu]. Now i am the Hadoop YARN Contributor, could you help me to add to Hadoop HDFS Contributor. It's my honor to contribute to Hadoop HDFS. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992240#comment-16992240 ] zhuqi commented on HDFS-15041: -- Hi [~hexiaoqiao] Our one cluster with too many deleted operations and write operations because of our new hive lifetime system with too many partitions . In some cases the 4ms will be too short to handle, so i want to let it to be configurable. Also some presto based realtime situation without using read from standby, may want to shorter the max lock time, in order to better the read performance. Thanks. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 5:48 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed? > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Attachment: HDFS-15041.001.patch Status: Patch Available (was: Open) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 2:45 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed? was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 2:26 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed. was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi commented on HDFS-15041: -- Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991729#comment-16991729 ] zhuqi commented on HDFS-15041: -- cc [~daryn] , [~weichiu] Our cluster wants to change it in order to get the better balancer between latency and rpc queue size boom. What do you think about it ? May i have the access to assign to myself. Thanks. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
zhuqi created HDFS-15041: Summary: Make MAX_LOCK_HOLD_MS and full queue size configurable Key: HDFS-15041 URL: https://issues.apache.org/jira/browse/HDFS-15041 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.2.0 Reporter: zhuqi Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different cluster have different need for the latency and the queue health standard. We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
[ https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963747#comment-16963747 ] zhuqi commented on HDFS-14944: -- Hi [~ayushtkn] We can add -fs to support dynamic define of the namespace. > ec admin such as : -enablePolicy should support multi federation namespace > not only the default namespace in core-site.xml > -- > > Key: HDFS-14944 > URL: https://issues.apache.org/jira/browse/HDFS-14944 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 3.1.0, 3.2.0 >Reporter: zhuqi >Priority: Major > > when we use the ec -enablePolicy, we only can enable the defaultFs namespace, > we should improve to support more namespace in our federation environment. We > can move the ecadmin to support multi namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
[ https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-14944: - Affects Version/s: 3.1.0 > ec admin such as : -enablePolicy should support multi federation namespace > not only the default namespace in core-site.xml > -- > > Key: HDFS-14944 > URL: https://issues.apache.org/jira/browse/HDFS-14944 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 3.1.0, 3.2.0 >Reporter: zhuqi >Priority: Major > > when we use the ec -enablePolicy, we only can enable the defaultFs namespace, > we should improve to support more namespace in our federation environment. We > can move the ecadmin to support multi namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
[ https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-14944: - Description: when we use the ec -enablePolicy, we only can enable the defaultFs namespace, we should improve to support more namespace in our federation environment. We can move the ecadmin to support multi namespace. (was: when we use the ec -enablePolicy, we only can enable the defaultFs namespace, we should improve to support more namespace in our federation environment.) > ec admin such as : -enablePolicy should support multi federation namespace > not only the default namespace in core-site.xml > -- > > Key: HDFS-14944 > URL: https://issues.apache.org/jira/browse/HDFS-14944 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 3.2.0 >Reporter: zhuqi >Priority: Major > > when we use the ec -enablePolicy, we only can enable the defaultFs namespace, > we should improve to support more namespace in our federation environment. We > can move the ecadmin to support multi namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
[ https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-14944: - Description: when we use the ec -enablePolicy, we only can enable the defaultFs namespace, we should improve to support more namespace in our federation environment. > ec admin such as : -enablePolicy should support multi federation namespace > not only the default namespace in core-site.xml > -- > > Key: HDFS-14944 > URL: https://issues.apache.org/jira/browse/HDFS-14944 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 3.2.0 >Reporter: zhuqi >Priority: Major > > when we use the ec -enablePolicy, we only can enable the defaultFs namespace, > we should improve to support more namespace in our federation environment. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14944) ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
[ https://issues.apache.org/jira/browse/HDFS-14944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-14944: - Summary: ec admin such as : -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml (was: ec -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml) > ec admin such as : -enablePolicy should support multi federation namespace > not only the default namespace in core-site.xml > -- > > Key: HDFS-14944 > URL: https://issues.apache.org/jira/browse/HDFS-14944 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0, 3.2.0 >Reporter: zhuqi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14944) ec -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml
zhuqi created HDFS-14944: Summary: ec -enablePolicy should support multi federation namespace not only the default namespace in core-site.xml Key: HDFS-14944 URL: https://issues.apache.org/jira/browse/HDFS-14944 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.2.0, 3.0.0 Reporter: zhuqi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org