[jira] [Commented] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
[ https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117495#comment-15117495 ] Rich Haase commented on HADOOP-12455: - Unit test failure appears unrelated to the changes made to the globber: Failed tests: TestClusterTopology.testChooseRandom:169->Assert.assertFalse:64->Assert.assertTrue:41->Assert.fail:88 Not choosing nodes randomly > fs.Globber breaks on colon in filename; doesn't use Path's handling for colons > -- > > Key: HADOOP-12455 > URL: https://issues.apache.org/jira/browse/HADOOP-12455 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Daniel Barclay (Drill) >Assignee: Rich Haase > Labels: trivial > Fix For: 2.9.0 > > Attachments: HADOOP-12455.patch > > > {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory > contains a file whose simple name contains a colon. > The problem seem to be in the code currently at lines 258 and 257 > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]: > {noformat} > 256: // Set the child path based on the parent path. > 257: child.setPath(new Path(candidate.getPath(), > 258: child.getPath().getName())); > {noformat} > That last line should probably be: > {noformat} > new Path(null, null, child.getPath().getName(; > {noformat} > > The bug in the current code is that: > 1) {{child.getPath().getName()}} gets the simple name (last segment) of the > child {{Path}} as a _raw_ string (not necessarily the corresponding relative > _{{Path}}_ string), and > 2) that raw string is passed as {{Path(Path, String)}}'s second argument, > which takes a _{{Path}}_ string. > When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a > {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path > "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a > {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, > aborting the entire {{glob()}} call. > > Adding the call to {{Path(String, String, String)}} does the equivalent of > converting the raw string "{{xxx:yyy}}" to the {{Path}} string > "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
[ https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-12455: Labels: trivial (was: ) Fix Version/s: 2.9.0 Target Version/s: 2.9.0 Status: Patch Available (was: Open) One line changed, one assertion added to verify. > fs.Globber breaks on colon in filename; doesn't use Path's handling for colons > -- > > Key: HADOOP-12455 > URL: https://issues.apache.org/jira/browse/HADOOP-12455 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Daniel Barclay (Drill) >Assignee: Rich Haase > Labels: trivial > Fix For: 2.9.0 > > > {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory > contains a file whose simple name contains a colon. > The problem seem to be in the code currently at lines 258 and 257 > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]: > {noformat} > 256: // Set the child path based on the parent path. > 257: child.setPath(new Path(candidate.getPath(), > 258: child.getPath().getName())); > {noformat} > That last line should probably be: > {noformat} > new Path(null, null, child.getPath().getName(; > {noformat} > > The bug in the current code is that: > 1) {{child.getPath().getName()}} gets the simple name (last segment) of the > child {{Path}} as a _raw_ string (not necessarily the corresponding relative > _{{Path}}_ string), and > 2) that raw string is passed as {{Path(Path, String)}}'s second argument, > which takes a _{{Path}}_ string. > When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a > {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path > "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a > {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, > aborting the entire {{glob()}} call. > > Adding the call to {{Path(String, String, String)}} does the equivalent of > converting the raw string "{{xxx:yyy}}" to the {{Path}} string > "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
[ https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-12455: Attachment: HADOOP-12455.patch Implemented exactly as suggested. > fs.Globber breaks on colon in filename; doesn't use Path's handling for colons > -- > > Key: HADOOP-12455 > URL: https://issues.apache.org/jira/browse/HADOOP-12455 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Daniel Barclay (Drill) >Assignee: Rich Haase > Labels: trivial > Fix For: 2.9.0 > > Attachments: HADOOP-12455.patch > > > {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory > contains a file whose simple name contains a colon. > The problem seem to be in the code currently at lines 258 and 257 > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]: > {noformat} > 256: // Set the child path based on the parent path. > 257: child.setPath(new Path(candidate.getPath(), > 258: child.getPath().getName())); > {noformat} > That last line should probably be: > {noformat} > new Path(null, null, child.getPath().getName(; > {noformat} > > The bug in the current code is that: > 1) {{child.getPath().getName()}} gets the simple name (last segment) of the > child {{Path}} as a _raw_ string (not necessarily the corresponding relative > _{{Path}}_ string), and > 2) that raw string is passed as {{Path(Path, String)}}'s second argument, > which takes a _{{Path}}_ string. > When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a > {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path > "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a > {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, > aborting the entire {{glob()}} call. > > Adding the call to {{Path(String, String, String)}} does the equivalent of > converting the raw string "{{xxx:yyy}}" to the {{Path}} string > "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
[ https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase reassigned HADOOP-12455: --- Assignee: Rich Haase > fs.Globber breaks on colon in filename; doesn't use Path's handling for colons > -- > > Key: HADOOP-12455 > URL: https://issues.apache.org/jira/browse/HADOOP-12455 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Reporter: Daniel Barclay (Drill) >Assignee: Rich Haase > > {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory > contains a file whose simple name contains a colon. > The problem seem to be in the code currently at lines 258 and 257 > [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]: > {noformat} > 256: // Set the child path based on the parent path. > 257: child.setPath(new Path(candidate.getPath(), > 258: child.getPath().getName())); > {noformat} > That last line should probably be: > {noformat} > new Path(null, null, child.getPath().getName(; > {noformat} > > The bug in the current code is that: > 1) {{child.getPath().getName()}} gets the simple name (last segment) of the > child {{Path}} as a _raw_ string (not necessarily the corresponding relative > _{{Path}}_ string), and > 2) that raw string is passed as {{Path(Path, String)}}'s second argument, > which takes a _{{Path}}_ string. > When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a > {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path > "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a > {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, > aborting the entire {{glob()}} call. > > Adding the call to {{Path(String, String, String)}} does the equivalent of > converting the raw string "{{xxx:yyy}}" to the {{Path}} string > "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied
[ https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-12085: Attachment: HADOOP-12085.patch DistCp DEBUG logs report -filters files as if they were going to be copied -- Key: HADOOP-12085 URL: https://issues.apache.org/jira/browse/HADOOP-12085 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.8.0 Reporter: Rich Haase Assignee: Rich Haase Labels: easyfix Attachments: HADOOP-12085.patch HADOOP-1540 added the ability to exclude files for copy by comparing each file to a set of Java regex patterns. If the file matches a given patch it will be excluded from the copy listing. However, this patch failed to update the debug logging to report when files are excluded from the copy listing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied
[ https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-12085: Priority: Trivial (was: Major) DistCp DEBUG logs report -filters files as if they were going to be copied -- Key: HADOOP-12085 URL: https://issues.apache.org/jira/browse/HADOOP-12085 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.8.0 Reporter: Rich Haase Assignee: Rich Haase Priority: Trivial Labels: easyfix Attachments: HADOOP-12085.patch HADOOP-1540 added the ability to exclude files for copy by comparing each file to a set of Java regex patterns. If the file matches a given patch it will be excluded from the copy listing. However, this patch failed to update the debug logging to report when files are excluded from the copy listing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied
[ https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-12085: Status: Patch Available (was: Open) [~jingzhao] I've added this patch as a fix to HADOOP-1540. It's a very minor change, so I've not included any tests. If you feel a test is appropriate for this change I will be happy to add it. DistCp DEBUG logs report -filters files as if they were going to be copied -- Key: HADOOP-12085 URL: https://issues.apache.org/jira/browse/HADOOP-12085 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.8.0 Reporter: Rich Haase Assignee: Rich Haase Labels: easyfix HADOOP-1540 added the ability to exclude files for copy by comparing each file to a set of Java regex patterns. If the file matches a given patch it will be excluded from the copy listing. However, this patch failed to update the debug logging to report when files are excluded from the copy listing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied
Rich Haase created HADOOP-12085: --- Summary: DistCp DEBUG logs report -filters files as if they were going to be copied Key: HADOOP-12085 URL: https://issues.apache.org/jira/browse/HADOOP-12085 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.8.0 Reporter: Rich Haase Assignee: Rich Haase HADOOP-1540 added the ability to exclude files for copy by comparing each file to a set of Java regex patterns. If the file matches a given patch it will be excluded from the copy listing. However, this patch failed to update the debug logging to report when files are excluded from the copy listing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.008.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.009.patch [~jingzhao] I've updated the patch with the fixed you suggested. Thanks, as always, for your help!! distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.009.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) Support file exclusion list in distcp
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549149#comment-14549149 ] Rich Haase commented on HADOOP-1540: Thanks again [~jingzhao] and [~3opan] for the reviews! Support file exclusion list in distcp - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, distcp Attachments: HADOOP-1540.009.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540266#comment-14540266 ] Rich Haase commented on HADOOP-1540: [~jingzhao], would you please review the latest revision of this patch when you have time? Thanks! distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.008.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.007.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.008.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.008.patch Fixed whitespace and moved file read operations for CopyFilter into an initialize method. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.008.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.006.patch Refactored the patch to do exclusion filtering while building the CopyListing. It turns out there is a method (SimpleCopyListing#shouldCopy) which always returns true. I've added a couple of basic classes to perform the default (always true) behavior and a SimpleCopyFilter class, which uses a string compare to determine what should be excluded from the copy. I think this design will be a bit more flexible in future, and it avoids having mappers which get a chunk of files to copy that should all be excluded. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Labels: bb2015-05-tbr patch (was: patch) Status: Patch Available (was: Open) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch, bb2015-05-tbr Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535330#comment-14535330 ] Rich Haase commented on HADOOP-1540: [~3opan] Thanks for the comments! #1 and #3 I'll fix those space issues for the next rev of the patch. #2 and #4 were added because checkstyle failed if I didn't make those changes. I'd have preferred to leave them alone. Maybe someone can comment on how to avoid these kinds of checkstyle issues? #5 You are absolutely right. My initial pass at the patch used regex patterns. I switched that logic only because at the time I was doing exclusion filtering in the CopyMapper and compiling lots of regex in every mapper was likely to be expensive with large filter lists. Since we are only doing filtering while building the CopyListing it's probably not as big a deal to use regex, although I am open to alternate suggestions. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535352#comment-14535352 ] Rich Haase commented on HADOOP-1540: Good idea. I'll make those changes and resubmit. Thanks again for the review! distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.005.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.007.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.006.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.007.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.007.patch Incorporated suggested changes from [~3opan]. Renamed SimpleCopyFilter to RegexCopyFilter to better reflect what the class does. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch, HADOOP-1540.007.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.004.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.005.patch, HADOOP-1540.006.patch, HADOOP-1540.007.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.003.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.004.patch, HADOOP-1540.005.patch, HADOOP-1540.006.patch, HADOOP-1540.007.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Status: Open (was: Patch Available) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Status: Open (was: Patch Available) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.005.patch Fixed issues described in [~jingzhao] comments. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Status: Patch Available (was: Open) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Target Version/s: 2.8.0 (was: 3.0.0) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531848#comment-14531848 ] Rich Haase commented on HADOOP-1540: Found a bug in the patch. The CopyCommitter doesn't know about exclusions so it will throw an exception when it tries to preserve permissions on a Path that was excluded. Need to refactor the change to separate exclusion logic into a class that is usable by the CopyMapper and CopyCommiter. Since the code has to be refactored to work correctly it makes sense to define an interface for exclusions. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: BB2015-05-TBR, patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, HADOOP-1540.005.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529487#comment-14529487 ] Rich Haase commented on HADOOP-1540: I'm going to review the patch again this evening and think about how an ExclusionListing interface would work. I like the idea of a more extensible interface. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529472#comment-14529472 ] Rich Haase commented on HADOOP-1540: [~jingzhao] Thanks for the quick review! 1. I'll updated the patch to skip unboxing mapBandwidth. 2. I think that may have been caused when I rebased the patch against 3.0.0. In any case, I will be sure to fix the patch so it doesn't delete tests! 3. Will do. 4. Will do. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.004.patch Take 4. I think I have all of the checkstyle issues fixed. I want to object super strongly to an 80 char limit for line length. It's been a long time since 80 chars was a reasonable line length. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Target Version/s: 3.0.0 Fix Version/s: (was: 2.6.0) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526883#comment-14526883 ] Rich Haase commented on HADOOP-1540: I'm not sure what to do about this final checkstyle warning: ./hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java:77:3: Method length is 196 lines (max allowed is 150). The method in question is parse(), which was about 190 lines before I made my changes and is now 196 lines. I can break up the parse method, but that seems like it would be more appropriate if this were a refactoring change, rather than a feature addition. Can someone offer some suggestions for how I should handle this? distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.003.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.002.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.003.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.003.patch Fixed checkstyle errors distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.002.patch, HADOOP-1540.003.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517921#comment-14517921 ] Rich Haase commented on HADOOP-1540: [~jingzhao] Just finished rebasing against trunk and testing the patch. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.001.patch, HADOOP-1540.branch-2.6.0.001.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.001.patch rebased patch against trunk distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.001.patch, HADOOP-1540.branch-2.6.0.001.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.001.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.002.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518148#comment-14518148 ] Rich Haase commented on HADOOP-1540: Working on fixing the items Jenkins is complaining about. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.001.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.002.patch This revision of the patch should fix findbugs/javac warnings. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.001.patch, HADOOP-1540.002.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: (was: HADOOP-1540.branch-2.6.0.001.patch) distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.001.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515182#comment-14515182 ] Rich Haase commented on HADOOP-1540: I have a patch for this JIRA that I've just started testing. https://github.com/richhaase/hadoop-patches/blob/master/HADOOP-1540.branch-2.6.0.001.patch The patch adds a -exclusions arg option to distcp. The argument is a file containing a list of Java Regex Patterns (one per line). Each file that is to be copied will be compared the list of exclusion patterns. IF an exclusion pattern is matched then the file will not be copied. Example CLI (running with a patched JAR on a Hortonworks HDP 2.2.4 cluster): *$ export HADOOP_USER_CLASSPATH_FIRST=true; export HADOOP_CLASSPATH=/home/rhaase/hadoop-distcp-2.6.0-20150426160037.jar; mapred distcp -update -exclusions exclude.txt /user/hadoop/radio /user/rhaase/radio* 5/04/27 15:26:55 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/user/hadoop/radio], targetPath=/user/rhaase/radio, targetPathExists=false, preserveRawXattrs=false, exclusionsFile='exclude.txt'} ... 15/04/27 15:42:27 INFO mapreduce.Job: map 100% reduce 0% 15/04/27 15:42:27 INFO mapreduce.Job: Job job_1429896015201_0035 completed successfully 15/04/27 15:42:27 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=2392499 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=358894362945 HDFS: Number of bytes written=358893418844 HDFS: Number of read operations=3214 HDFS: Number of large read operations=0 HDFS: Number of write operations=633 Job Counters Launched map tasks=21 Other local map tasks=21 Total time spent by all maps in occupied slots (ms)=4297461 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4297461 Total vcore-seconds taken by all map tasks=4297461 Total megabyte-seconds taken by all map tasks=4400600064 Map-Reduce Framework Map input records=4296 Map output records=0 Input split bytes=2457 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=4573 CPU time spent (ms)=2571060 Physical memory (bytes) snapshot=10379874304 Virtual memory (bytes) snapshot=56655720448 Total committed heap usage (bytes)=43711463424 File Input Format Counters Bytes Read=941644 File Output Format Counters Bytes Written=0 org.apache.hadoop.tools.mapred.CopyMapper$Counter BYTESCOPIED=358893418844 *BYTESEXCLUDED=1407553620118* BYTESEXPECTED=358893418844 COPY=322 *EXCLUDED=3974* distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Reporter: Senthil Subramanian Priority: Minor There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Fix Version/s: 2.6.0 Assignee: Rich Haase Labels: patch (was: ) Affects Version/s: 2.6.0 Status: Patch Available (was: Reopened) Submitting patch for Jenkins test run. I think there is a bug in the way I am handling the argument to -exclusions. Files only can be read from the files system configured in hdfs-site.xml in the tests I've been running on an actual cluster. distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-1540) distcp should support an exclude list
[ https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rich Haase updated HADOOP-1540: --- Attachment: HADOOP-1540.branch-2.6.0.001.patch distcp should support an exclude list - Key: HADOOP-1540 URL: https://issues.apache.org/jira/browse/HADOOP-1540 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.6.0 Reporter: Senthil Subramanian Assignee: Rich Haase Priority: Minor Labels: patch Fix For: 2.6.0 Attachments: HADOOP-1540.branch-2.6.0.001.patch There should be a way to ignore specific paths (eg: those that have already been copied over under the current srcPath). -- This message was sent by Atlassian JIRA (v6.3.4#6332)