[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16000:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2024-01-03 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16000:
--
Target Version/s: 3.4.1  (was: 3.4.0)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2023-09-12 Thread Xiangyi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangyi Zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

 
h3. Rename logic optimization:
 * Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 * I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.
h3. I think the following two points can optimize the efficiency of rename 
execution
h3. QuotaCount calculation time-consuming optimization:
 * Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

h3. Rename logic optimization:
 * Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 * I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-30 Thread Daryn Sharp (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-16000:
---
Attachment: (was: image.png)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-30 Thread Daryn Sharp (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-16000:
---
Attachment: image.png

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch, image.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16000:
--
Labels: pull-request-available  (was: )

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16000:
-
Target Version/s: 3.4.0
  Status: Patch Available  (was: Open)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.
h3. I think the following two points can optimize the efficiency of rename 
execution
h3. QuotaCount calculation time-consuming optimization:
 * Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

h3. Rename logic optimization:
 * Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 * I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> h3. I think the following two points can optimize the efficiency of rename 
> execution
> h3. QuotaCount calculation time-consuming optimization:
>  * Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  * In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: (was: It takes a long time to move a large directory with 
rename. For example, it takes about 40 seconds to move a 1000W directory. When 
a large amount of data is deleted to the trash, the move large directory will 
occur when the recycle bin makes checkpoint. In addition, the user may also 
actively trigger the move large directory operation, which will cause the 
NameNode to lock too long and be killed by Zkfc. Through the flame graph, it is 
found that the main time consuming is to create the EnumCounters object.
h4. *I think the following two points can optimize the efficiency of rename 
execution*

*QuotaCount calculation time-consuming optimization:*
 * Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 
h3. Rename logic optimization:

 
 * Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 * I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.
h4. *I think the following two points can optimize the efficiency of rename 
execution*

*QuotaCount calculation time-consuming optimization:*
 * Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 
h3. Rename logic optimization:

 
 * Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 * I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * 
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 * 
 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * 
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 * 
 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * 
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 * 
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*
 
 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.

 # 
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*
 
 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*

 
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ** Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ** In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 **  Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ** I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:
 
 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*
 
 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*
 
 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.

 # 
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:
 
 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.
 ## I think some of the above three quota quota calculations are unnecessary. 
For example, if all parent directories of the source directory and target 
directory are not configured with quota, there is no need to calculate 
quotaCount. Even if both the source directory and the target directory use 
quota, there is no need to calculate the quota three times. The calculation 
logic for the first and third times is the same, and it only needs to be 
calculated once.

  was:
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 # 
 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 # 
 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.

I think some of the above three quota quota calculations are unnecessary. For 
example, if all parent directories of the source directory and target directory 
are not configured with quota, there is no need to calculate quotaCount. Even 
if both the source directory and the target directory use quota, there is no 
need to calculate the quota three times. The calculation logic for the first 
and third times is the same, and it only needs to be calculated once.


> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will 

[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Attachment: (was: HDFS-16000)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> *I think the following two points can optimize the efficiency of rename 
> execution*
>  * 
> h3. *QuotaCount calculation time-consuming optimization:*
>  # 
>  ## Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  ## In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
>  * 
> h3. Rename logic optimization:
>  # 
>  ## Regardless of whether the rename operation is the source directory and 
> the target directory, the quota count must be calculated three times. The 
> first time, check whether the moved directory exceeds the target directory 
> quota, the second time, calculate the mobile directory quota to update the 
> source directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
> I think some of the above three quota quota calculations are unnecessary. For 
> example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Attachment: HDFS-16000.patch

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> *I think the following two points can optimize the efficiency of rename 
> execution*
>  * 
> h3. *QuotaCount calculation time-consuming optimization:*
>  # 
>  ## Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  ## In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
>  * 
> h3. Rename logic optimization:
>  # 
>  ## Regardless of whether the rename operation is the source directory and 
> the target directory, the quota count must be calculated three times. The 
> first time, check whether the moved directory exceeds the target directory 
> quota, the second time, calculate the mobile directory quota to update the 
> source directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
> I think some of the above three quota quota calculations are unnecessary. For 
> example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Attachment: HDFS-16000

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> *I think the following two points can optimize the efficiency of rename 
> execution*
>  * 
> h3. *QuotaCount calculation time-consuming optimization:*
>  # 
>  ## Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  ## In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
>  * 
> h3. Rename logic optimization:
>  # 
>  ## Regardless of whether the rename operation is the source directory and 
> the target directory, the quota count must be calculated three times. The 
> first time, check whether the moved directory exceeds the target directory 
> quota, the second time, calculate the mobile directory quota to update the 
> source directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
> I think some of the above three quota quota calculations are unnecessary. For 
> example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Environment: (was: It takes a long time to move a large directory with 
rename. For example, it takes about 40 seconds to move a 1000W directory. When 
a large amount of data is deleted to the trash, the move large directory will 
occur when the recycle bin makes checkpoint. In addition, the user may also 
actively trigger the move large directory operation, which will cause the 
NameNode to lock too long and be killed by Zkfc. Through the flame graph, it is 
found that the main time consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.
 * 
h3. Rename logic optimization:

 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.

I think some of the above three quota quota calculations are unnecessary. For 
example, if all parent directories of the source directory and target directory 
are not configured with quota, there is no need to calculate quotaCount. Even 
if both the source directory and the target directory use quota, there is no 
need to calculate the quota three times. The calculation logic for the first 
and third times is the same, and it only needs to be calculated once.)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2021-04-28 Thread zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhu updated HDFS-16000:
---
Description: 
It takes a long time to move a large directory with rename. For example, it 
takes about 40 seconds to move a 1000W directory. When a large amount of data 
is deleted to the trash, the move large directory will occur when the recycle 
bin makes checkpoint. In addition, the user may also actively trigger the move 
large directory operation, which will cause the NameNode to lock too long and 
be killed by Zkfc. Through the flame graph, it is found that the main time 
consuming is to create the EnumCounters object.

*I think the following two points can optimize the efficiency of rename 
execution*
 * 
h3. *QuotaCount calculation time-consuming optimization:*

 # 
 ## Create a QuotaCounts object in the calculation directory quotaCount, and 
pass the quotaCount to the next calculation function through a parameter each 
time, so as to avoid creating an EnumCounters object for each calculation.
 ## In addition, through the flame graph, it is found that using lambda to 
modify QuotaCounts takes longer than the ordinary method, so the ordinary 
method is used to modify the QuotaCounts count.

 * 
h3. Rename logic optimization:

 # 
 ## Regardless of whether the rename operation is the source directory and the 
target directory, the quota count must be calculated three times. The first 
time, check whether the moved directory exceeds the target directory quota, the 
second time, calculate the mobile directory quota to update the source 
directory quota, and the third time, calculate the mobile directory 
configuration update to the target directory.

I think some of the above three quota quota calculations are unnecessary. For 
example, if all parent directories of the source directory and target directory 
are not configured with quota, there is no need to calculate quotaCount. Even 
if both the source directory and the target directory use quota, there is no 
need to calculate the quota three times. The calculation logic for the first 
and third times is the same, and it only needs to be calculated once.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: zhu
>Assignee: zhu
>Priority: Major
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg
>
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
> *I think the following two points can optimize the efficiency of rename 
> execution*
>  * 
> h3. *QuotaCount calculation time-consuming optimization:*
>  # 
>  ## Create a QuotaCounts object in the calculation directory quotaCount, and 
> pass the quotaCount to the next calculation function through a parameter each 
> time, so as to avoid creating an EnumCounters object for each calculation.
>  ## In addition, through the flame graph, it is found that using lambda to 
> modify QuotaCounts takes longer than the ordinary method, so the ordinary 
> method is used to modify the QuotaCounts count.
>  * 
> h3. Rename logic optimization:
>  # 
>  ## Regardless of whether the rename operation is the source directory and 
> the target directory, the quota count must be calculated three times. The 
> first time, check whether the moved directory exceeds the target directory 
> quota, the second time, calculate the mobile directory quota to update the 
> source directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
> I think some of the above three quota quota calculations are unnecessary. For 
> example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: