[jira] [Commented] (HADOOP-15384) distcp numListstatusThreads option doesn't get to -delete scan
[ https://issues.apache.org/jira/browse/HADOOP-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538339#comment-16538339 ] Hudson commented on HADOOP-15384: - FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14548 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14548/]) HADOOP-15384. distcp numListstatusThreads option doesn't get to -delete (stevel: rev ca8b80bf59c0570bb9172208d3a6c993a6854514) * (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/contract/AbstractContractDistCpTest.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyCommitter.java > distcp numListstatusThreads option doesn't get to -delete scan > -- > > Key: HADOOP-15384 > URL: https://issues.apache.org/jira/browse/HADOOP-15384 > Project: Hadoop Common > Issue Type: Sub-task > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Fix For: 3.1.1 > > Attachments: HADOOP-15384-001.patch > > > The distcp {{numListstatusThreads}} option isn't used when configuring the > GlobbedCopyListing used in {{CopyComitter.deleteMissing()}} > This means that for large scans of object stores, performance is > significantly worse. > Fix: pass the option down from the task conf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15384) distcp numListstatusThreads option doesn't get to -delete scan
[ https://issues.apache.org/jira/browse/HADOOP-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536897#comment-16536897 ] Ewan Higgs commented on HADOOP-15384: - I've tested this using 1, 5, and 20 threads and we get the expected performance improvement when constructing the source list. +1 > distcp numListstatusThreads option doesn't get to -delete scan > -- > > Key: HADOOP-15384 > URL: https://issues.apache.org/jira/browse/HADOOP-15384 > Project: Hadoop Common > Issue Type: Sub-task > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15384-001.patch > > > The distcp {{numListstatusThreads}} option isn't used when configuring the > GlobbedCopyListing used in {{CopyComitter.deleteMissing()}} > This means that for large scans of object stores, performance is > significantly worse. > Fix: pass the option down from the task conf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15384) distcp numListstatusThreads option doesn't get to -delete scan
[ https://issues.apache.org/jira/browse/HADOOP-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529875#comment-16529875 ] Ewan Higgs commented on HADOOP-15384: - The code LGTM. Trying to test this to see if there is a significant performance impact on S3. > distcp numListstatusThreads option doesn't get to -delete scan > -- > > Key: HADOOP-15384 > URL: https://issues.apache.org/jira/browse/HADOOP-15384 > Project: Hadoop Common > Issue Type: Sub-task > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15384-001.patch > > > The distcp {{numListstatusThreads}} option isn't used when configuring the > GlobbedCopyListing used in {{CopyComitter.deleteMissing()}} > This means that for large scans of object stores, performance is > significantly worse. > Fix: pass the option down from the task conf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15384) distcp numListstatusThreads option doesn't get to -delete scan
[ https://issues.apache.org/jira/browse/HADOOP-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523025#comment-16523025 ] genericqa commented on HADOOP-15384: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 16s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 36s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | HADOOP-15384 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12929072/HADOOP-15384-001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 06f7a1a1fde3 3.13.0-137-generic #186-Ubuntu SMP Mon Dec 4 19:09:19 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7a3c6e9 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14818/testReport/ | | Max. process+thread count | 335 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14818/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp numListstatusThreads option doesn't get to -delete scan > -- > > Key: HADOOP-15384 >
[jira] [Commented] (HADOOP-15384) distcp numListstatusThreads option doesn't get to -delete scan
[ https://issues.apache.org/jira/browse/HADOOP-15384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522708#comment-16522708 ] Steve Loughran commented on HADOOP-15384: - Patch 001: passes thread count down, logs it, and splits time for reporting listing into source and dest This has tangible improvements when using object stores as a destination, though the mimicking of directories trees can still make distcp to some remote stores (s3, swift) still tangibly awful. This is all you can do short of a complete rewrite. Which I don't intend to propose for the following reason: distcp is a complex and critical part of too many people's workflows. Testing: ran {{ITestS3AContractDistCp}} against S3 Ireland. No new tests, as the distcp contract tests were already setting num threads: I have set the thread count to 40 though, that being the maximum. Here's the output of the relevant phase of {{testUpdateDeepDirectoryStructureToRemote}} {code} 2018-06-25 19:35:18,047 [Thread-139] INFO mapred.CopyCommitter (CopyCommitter.java:deleteMissing(387)) - -delete option is enabled. About to remove entries from target that are missing in source 2018-06-25 19:35:18,062 [Thread-139] INFO mapred.CopyCommitter (CopyCommitter.java:deleteMissing(396)) - Source listing completed in 0:00:00.015 2018-06-25 19:35:18,063 [Thread-139] INFO mapred.CopyCommitter (CopyCommitter.java:listTargetFiles(554)) - Scanning destination directory s3a://hwdev-steve-ireland-new/test/ITestS3AContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/DELAY_LISTING_ME/outputDir/inputDir with thread count: 40 2018-06-25 19:35:19,872 [Thread-139] INFO tools.SimpleCopyListing (SimpleCopyListing.java:printStats(608)) - Paths (files+dirs) cnt = 11; dirCnt = 5 2018-06-25 19:35:19,872 [Thread-139] INFO tools.SimpleCopyListing (SimpleCopyListing.java:doBuildListing(402)) - Build file listing completed. 2018-06-25 19:35:19,886 [Thread-139] INFO tools.DistCp (CopyListing.java:buildListing(94)) - Number of paths in the copy list: 11 2018-06-25 19:35:19,899 [Thread-139] INFO tools.DistCp (CopyListing.java:buildListing(94)) - Number of paths in the copy list: 11 2018-06-25 19:35:19,913 [Thread-139] INFO mapred.CopyCommitter (CopyCommitter.java:deleteMissing(415)) - Destination listing completed in 0:00:01.851 {code} And for {{ITestAzureNativeContractDistCp}} {code} 2018-06-25 20:11:44,992 INFO [Thread-147]: mapred.LocalJobRunner (LocalJobRunner.java:runTasks(486)) - map task executor complete. 2018-06-25 20:11:44,992 INFO [Thread-147]: mapred.CopyCommitter (CopyCommitter.java:concatFileChunks(210)) - concat file chunks ... 2018-06-25 20:11:45,405 INFO [Thread-147]: mapred.CopyCommitter (CopyCommitter.java:deleteMissing(387)) - -delete option is enabled. About to remove entries from target that are missing in source 2018-06-25 20:11:45,418 INFO [Thread-147]: mapred.CopyCommitter (CopyCommitter.java:deleteMissing(396)) - Source listing completed in 0:00:00.013 2018-06-25 20:11:45,419 INFO [Thread-147]: mapred.CopyCommitter (CopyCommitter.java:listTargetFiles(554)) - Scanning destination directory wasb://contr...@contender.blob.core.windows.net/test/ITestAzureNativeContractDistCp/testUpdateDeepDirectoryStructureToRemote/remote/outputDir/inputDir with thread count: 40 2018-06-25 20:11:46,338 INFO [Thread-147]: tools.SimpleCopyListing (SimpleCopyListing.java:printStats(608)) - Paths (files+dirs) cnt = 11; dirCnt = 5 2018-06-25 20:11:46,338 INFO [Thread-147]: tools.SimpleCopyListing (SimpleCopyListing.java:doBuildListing(402)) - Build file listing completed. 2018-06-25 20:11:46,351 INFO [Thread-147]: tools.DistCp (CopyListing.java:buildListing(94)) - Number of paths in the copy list: 11 2018-06-25 20:11:46,365 INFO [Thread-147]: tools.DistCp (CopyListing.java:buildListing(94)) - Number of paths in the copy list: 11 2018-06-25 20:11:46,377 INFO [Thread-147]: mapred.CopyCommitter (CopyCommitter.java:deleteMissing(415)) - Destination listing completed in 0:00:00.959 {code} A small bird fell out the sky, deceased, during the S3A run. It didn't happen on a rerun —I'm assuming unrelated. If more wild animals die during S3 integration tests then it'd be something to consider a significant regression in the AWS SDK + [~ehiggs], [~fabbri] > distcp numListstatusThreads option doesn't get to -delete scan > -- > > Key: HADOOP-15384 > URL: https://issues.apache.org/jira/browse/HADOOP-15384 > Project: Hadoop Common > Issue Type: Sub-task > Components: tools/distcp >Affects Versions: 3.1.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-15384-001.patch > > > The distcp {{numListstatusThreads}} option isn't used when configuring the