subject:"\[jira\] \[Commented\] \(HDFS\-16000\) HDFS \: Rename performance optimization"

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2024-01-04 Thread Shilun Fan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802619#comment-17802619
 ] 

Shilun Fan commented on HDFS-16000:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-12-27 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800745#comment-17800745
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

lfxy commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1436951488


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode srcParentNode = FSDirectory.
+getFirstSetQuotaParentNode(srcIIP);
+INode srcInode = srcIIP.getLastINode();
+INode dstParentNode = FSDirectory.
+getFirstSetQuotaParentNode(dstIIP);
+byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+if (srcStoragePolicyID != dstStoragePolicyID) {
+  srcStoragePolicyCounts.add(srcIIP.getLastINode().
+  computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcIIP.getLastINode()
+  .computeQuotaUsage(bsps, dstParent.getStoragePolicyID(), false,
+  Snapshot.CURRENT_STATE_ID));
+} else if (srcParentNode != dstParentNode || tx.withCount != null) {
+  
srcStoragePolicyCounts.add(srcIIP.getLastINode().computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcStoragePolicyCounts);
+}

Review Comment:
   @Hexiaoqiao @zhuxiangyi Our production cluster encountered performance 
issues related to rename and needed to be optimized.
   In (srcStoragePolicyID == dstStoragePolicyID) condition, do we need to 
compute only when both conditions srcParentNode != dstParentNode and 
tx.isSrcInSnapshot == false are met? If it is right, can we use the below 
logics:
   if (srcStoragePolicyID != dstStoragePolicyID) {
   compute;
   } else if (srcParentNode != dstParentNode && !tx.isSrcInSnapshot) {
   compute;
   }
   Looking forward to your reply, thanks.





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-10-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773296#comment-17773296
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

Hexiaoqiao commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1752982944

   Great! will wait for this feature to be ready!




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-10-08 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773105#comment-17773105
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1752373446

   > @zhuxiangyi Hi, do you still work on this? Thanks.
   
   Hi , 
   I have taken a long vacation and apologize for not replying to you in a 
timely manner.
   I am still doing this.




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-10-08 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773091#comment-17773091
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

Hexiaoqiao commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1752298163

   @zhuxiangyi Hi, do you still work on this? Thanks.




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766304#comment-17766304
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1723082950

   
   > Thanks @zhuxiangyi for your works. It is great idea and improvement. 
Almost LGTM. Leave some comments inline. Will give my +1 once correct. Thanks.
   @Hexiaoqiao 
   Thank you very much for your reivew. I have fixed the problem and 
resubmitted the code.




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766299#comment-17766299
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328469427


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode srcParentNode = FSDirectory.
+getFirstSetQuotaParentNode(srcIIP);
+INode srcInode = srcIIP.getLastINode();
+INode dstParentNode = FSDirectory.
+getFirstSetQuotaParentNode(dstIIP);
+byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+if (srcStoragePolicyID != dstStoragePolicyID) {
+  srcStoragePolicyCounts.add(srcIIP.getLastINode().
+  computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcIIP.getLastINode()
+  .computeQuotaUsage(bsps, dstParent.getStoragePolicyID(), false,
+  Snapshot.CURRENT_STATE_ID));
+} else if (srcParentNode != dstParentNode || tx.withCount != null) {
+  
srcStoragePolicyCounts.add(srcIIP.getLastINode().computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcStoragePolicyCounts);
+}

Review Comment:
   If this is the case, it can be understood that src and dst have a configured 
quota, and src is the isSrcInSnapshot attribute.
   





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766298#comment-17766298
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328465474


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##
@@ -1468,6 +1475,30 @@ static Collection 
normalizePaths(Collection paths,
 return normalized;
   }
 
+  /**
+   * Get the first Node that sets Quota.
+   */
+  static INode getFirstSetQuotaParentNode(INodesInPath iip) {
+for (int i = iip.length() - 1; i > 0; i--) {
+  INode currNode = iip.getINode(i);
+  if (currNode == null) {

Review Comment:
   There should be no expected





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766297#comment-17766297
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328464878


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##
@@ -1468,6 +1475,30 @@ static Collection 
normalizePaths(Collection paths,
 return normalized;
   }
 
+  /**
+   * Get the first Node that sets Quota.
+   */
+  static INode getFirstSetQuotaParentNode(INodesInPath iip) {
+for (int i = iip.length() - 1; i > 0; i--) {
+  INode currNode = iip.getINode(i);
+  if (currNode == null) {

Review Comment:
   Here we traverse from the last INode to the penultimate node, excluding the 
root node.





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766295#comment-17766295
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328454449


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode srcParentNode = FSDirectory.
+getFirstSetQuotaParentNode(srcIIP);
+INode srcInode = srcIIP.getLastINode();
+INode dstParentNode = FSDirectory.
+getFirstSetQuotaParentNode(dstIIP);
+byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+if (srcStoragePolicyID != dstStoragePolicyID) {
+  srcStoragePolicyCounts.add(srcIIP.getLastINode().
+  computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcIIP.getLastINode()
+  .computeQuotaUsage(bsps, dstParent.getStoragePolicyID(), false,
+  Snapshot.CURRENT_STATE_ID));
+} else if (srcParentNode != dstParentNode || tx.withCount != null) {

Review Comment:
   This is to determine whether the inode isSrcInSnapshot. If it is 
isSrcInSnapshot, we will calculate the quotaCount. I will change this to 
isSrcInSnapshot to determine.





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-18 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766292#comment-17766292
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328445332


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode srcParentNode = FSDirectory.
+getFirstSetQuotaParentNode(srcIIP);
+INode srcInode = srcIIP.getLastINode();
+INode dstParentNode = FSDirectory.
+getFirstSetQuotaParentNode(dstIIP);
+byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+if (srcStoragePolicyID != dstStoragePolicyID) {
+  srcStoragePolicyCounts.add(srcIIP.getLastINode().
+  computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcIIP.getLastINode()

Review Comment:
   Thanks for finding this problem. If the Inode sets the StoragePolicy we 
should use the StoragePolicy calculation of the Inode. I will fix it.





> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-17 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766194#comment-17766194
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

Hexiaoqiao commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328244599


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##
@@ -1468,6 +1475,30 @@ static Collection 
normalizePaths(Collection paths,
 return normalized;
   }
 
+  /**
+   * Get the first Node that sets Quota.
+   */
+  static INode getFirstSetQuotaParentNode(INodesInPath iip) {
+for (int i = iip.length() - 1; i > 0; i--) {
+  INode currNode = iip.getINode(i);
+  if (currNode == null) {

Review Comment:
   Will it meet null here? if it is not expected, we should throw exception IMO.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode srcParentNode = FSDirectory.
+getFirstSetQuotaParentNode(srcIIP);
+INode srcInode = srcIIP.getLastINode();
+INode dstParentNode = FSDirectory.
+getFirstSetQuotaParentNode(dstIIP);
+byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode);
+byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent);
+if (srcStoragePolicyID != dstStoragePolicyID) {
+  srcStoragePolicyCounts.add(srcIIP.getLastINode().
+  computeQuotaUsage(bsps));
+  dstStoragePolicyCounts.add(srcIIP.getLastINode()

Review Comment:
   IIUC, this result will be used for the next verify and storage used 
addition/subtraction for src and dst inode, right? But I am confused if it will 
meet some issues here, given directory /a/b (whose storage policy is HDD), /c/d 
(whose storage policy is SSD), when rename from /a/b/r1 (let's assume 1GB used) 
to /c/d/r2, then total HDD storage used will decrease 1GB, and SSD storage used 
increase 1GB, this will be different with fact?



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java:
##
@@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd,
   }
 } finally {
   if (undoRemoveSrc) {
-tx.restoreSource();
+tx.restoreSource(srcStoragePolicyCounts);
   }
   if (undoRemoveDst) { // Rename failed - restore dst
-tx.restoreDst(bsps);
+tx.restoreDst(bsps, dstStoragePolicyCounts);
   }
 }
 NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " +
 "failed to rename " + src + " to " + dst);
 throw new IOException("rename from " + src + " to " + dst + " failed.");
   }
 
+  /*
+   * Calculate QuotaCounts based on parent directory and storage policy
+   * 1. If the storage policy of src and dst are different,
+   *  calculate the QuotaCounts of src and dst respectively.
+   * 2. If all parent nodes of src and dst are not set with Quota,
+   *  there is no need to calculate QuotaCount.
+   * 3. if parent nodes of src and dst have Quota configured,
+   *  the QuotaCount is calculated once using the storage policy of src.
+   * */
+  private static void computeQuotaCounts(
+  QuotaCounts srcStoragePolicyCounts,
+  QuotaCounts dstStoragePolicyCounts,
+  INodesInPath srcIIP,
+  INodesInPath dstIIP,
+  BlockStoragePolicySuite bsps,
+  RenameOperation tx) {
+INode dstParent = dstIIP.getINode(-2);
+INode

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764470#comment-17764470
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

Hexiaoqiao commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1716891085

   @zhuxiangyi Thanks for your contributions. Will review this week.




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764469#comment-17764469
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

zhuxiangyi commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1716889059

   @Hexiaoqiao @jojochuang 
   Can you review it for me




> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-12 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764242#comment-17764242
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

hadoop-yetus commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1715903240

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 32s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 194m 52s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 288m  5s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2964 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0f8c921cbb81 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b4ac1fa18a1bac8fc5b31bd7eb46093b1315c8ff |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/6/testReport/ |
   | Max. process+thread count | 3576 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> HDFS : Rename performance optimization
>

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2023-09-11 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763839#comment-17763839
 ] 

ASF GitHub Bot commented on HDFS-16000:
---

hadoop-yetus commented on PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1714295665

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   2m  2s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 195m 49s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 290m 23s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename |
   |   | hadoop.hdfs.server.namenode.TestFSImageWithSnapshot |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2964 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux bfbb4eff844e 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b3ebd6c491dac676e6d68f4f72737557e492a49a |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2964/5/testReport/ |
   | Max. process+thread count | 3402 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-09-23 Thread JiangHua Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419086#comment-17419086
 ] 

JiangHua Zhu commented on HDFS-16000:
-

Hello [~zhuxiangyi], I am also paying attention to this improvement item.
Can you share some relevant test reports.
thank you very much.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-05-05 Thread zhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339562#comment-17339562
 ] 

zhu commented on HDFS-16000:


[~daryn] Thanks for your comments. Please review [rename performance 
optimization|https://github.com/apache/hadoop/pull/2964].

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-04-30 Thread Daryn Sharp (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337540#comment-17337540
 ] 

Daryn Sharp commented on HDFS-16000:


Nice.  Quota optimization has been on my to-do list for years esp. since 
storage types made it much more expensive.  Quota calculations have 
historically been buggy so this needs to be carefully reviewed.  I'm currently 
consumed with finishing a 50-100X block placement performance optimization so 
please wait to commit until mid next week so I can hopefully carve out some 
time.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-04-29 Thread zhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337014#comment-17337014
 ] 

zhu commented on HDFS-16000:


[~hexiaoqiao] Thank you for your comments and suggestions. This week I will 
solve these warns and add tests.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-04-29 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335450#comment-17335450
 ] 

Xiaoqiao He commented on HDFS-16000:


[~zhuxiangyi] Thanks for your report and contribution. It is good idea and 
improvement.
BTW, just notice that different unit tests run failed and some 
checkstyle/javadoc warn. Would you mind to have another checks?
Another side, it is enough to submit patch here or github only. No need to 
sumbit both side. 
Thanks again.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334703#comment-17334703
 ] 

Hadoop QA commented on HDFS-16000:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
49s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
 6s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m  0s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 23m 
30s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
16s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
41s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/592/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt{color}
 | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
46s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/592/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color}
 | {color:red} hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 46s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/592/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color}
 | {color:red} hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
44s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/592/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color}
 | {color:red} hadoop-hdfs in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 44s{color} 
| 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/592/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color}
 | {color:red} hadoop-hdfs in the patch failed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  1s{color} |

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

22 matches

Site Navigation

Mail list logo

Footer information