[jira] [Comment Edited] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393356#comment-16393356
 ] 

Steve Loughran edited comment on HADOOP-15209 at 3/9/18 6:43 PM:
-

Testing: local, s3a, adl, wasb IT tests, command line updates including to an 
s3a endpoint running inconsistently. 

oh, and I tried FTP. Doesn't work as a dest; never has.


was (Author: ste...@apache.org):
Testing: local, s3a, adl, wasb IT tests, command line updates including to an 
s3a endpoint running inconsistently. 

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch, HADOOP-15209-007.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15209) DistCp to eliminate needless deletion of files under already-deleted directories

2018-03-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392038#comment-16392038
 ] 

Aaron Fabbri edited comment on HADOOP-15209 at 3/8/18 10:42 PM:


I will try to review / test this today. You feel like this is ready to commit?


was (Author: fabbri):
I will try to review / test this today.

> DistCp to eliminate needless deletion of files under already-deleted 
> directories
> 
>
> Key: HADOOP-15209
> URL: https://issues.apache.org/jira/browse/HADOOP-15209
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-15209-001.patch, HADOOP-15209-002.patch, 
> HADOOP-15209-003.patch, HADOOP-15209-004.patch, HADOOP-15209-005.patch, 
> HADOOP-15209-006.patch
>
>
> DistCP issues a delete(file) request even if is underneath an already deleted 
> directory. This generates needless load on filesystems/object stores, and, if 
> the store throttles delete, can dramatically slow down the delete operation.
> If the distcp delete operation can build a history of deleted directories, 
> then it will know when it does not need to issue those deletes.
> Care is needed here to make sure that whatever structure is created does not 
> overload the heap of the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org