[jira] [Commented] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons

2016-01-26 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117495#comment-15117495
 ] 

Rich Haase commented on HADOOP-12455:
-

Unit test failure appears unrelated to the changes made to the globber:

Failed tests: 
  
TestClusterTopology.testChooseRandom:169->Assert.assertFalse:64->Assert.assertTrue:41->Assert.fail:88
 Not choosing nodes randomly

> fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
> --
>
> Key: HADOOP-12455
> URL: https://issues.apache.org/jira/browse/HADOOP-12455
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Daniel Barclay (Drill)
>Assignee: Rich Haase
>  Labels: trivial
> Fix For: 2.9.0
>
> Attachments: HADOOP-12455.patch
>
>
> {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory 
> contains a file whose simple name contains a colon.
> The problem seem to be in the code currently at lines 258 and 257 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]:
> {noformat}
> 256:  // Set the child path based on the parent path.
> 257:  child.setPath(new Path(candidate.getPath(),
> 258:  child.getPath().getName()));
> {noformat}
> That last line should probably be:
> {noformat}
>   new Path(null, null, child.getPath().getName(;
> {noformat}
> 
> The bug in the current code is that:
> 1) {{child.getPath().getName()}} gets the simple name (last segment) of the 
> child {{Path}} as a _raw_ string (not necessarily the corresponding relative 
> _{{Path}}_ string), and
> 2) that raw string is passed as {{Path(Path, String)}}'s second argument, 
> which takes a _{{Path}}_ string.
> When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a 
> {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path 
> "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a 
> {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, 
> aborting the entire {{glob()}} call.
> 
> Adding the call to {{Path(String, String, String)}} does the equivalent of 
> converting the raw string "{{xxx:yyy}}" to the {{Path}} string 
> "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons

2016-01-25 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-12455:

  Labels: trivial  (was: )
   Fix Version/s: 2.9.0
Target Version/s: 2.9.0
  Status: Patch Available  (was: Open)

One line changed, one assertion added to verify.

> fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
> --
>
> Key: HADOOP-12455
> URL: https://issues.apache.org/jira/browse/HADOOP-12455
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Daniel Barclay (Drill)
>Assignee: Rich Haase
>  Labels: trivial
> Fix For: 2.9.0
>
>
> {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory 
> contains a file whose simple name contains a colon.
> The problem seem to be in the code currently at lines 258 and 257 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]:
> {noformat}
> 256:  // Set the child path based on the parent path.
> 257:  child.setPath(new Path(candidate.getPath(),
> 258:  child.getPath().getName()));
> {noformat}
> That last line should probably be:
> {noformat}
>   new Path(null, null, child.getPath().getName(;
> {noformat}
> 
> The bug in the current code is that:
> 1) {{child.getPath().getName()}} gets the simple name (last segment) of the 
> child {{Path}} as a _raw_ string (not necessarily the corresponding relative 
> _{{Path}}_ string), and
> 2) that raw string is passed as {{Path(Path, String)}}'s second argument, 
> which takes a _{{Path}}_ string.
> When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a 
> {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path 
> "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a 
> {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, 
> aborting the entire {{glob()}} call.
> 
> Adding the call to {{Path(String, String, String)}} does the equivalent of 
> converting the raw string "{{xxx:yyy}}" to the {{Path}} string 
> "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons

2016-01-25 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-12455:

Attachment: HADOOP-12455.patch

Implemented exactly as suggested.

> fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
> --
>
> Key: HADOOP-12455
> URL: https://issues.apache.org/jira/browse/HADOOP-12455
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Daniel Barclay (Drill)
>Assignee: Rich Haase
>  Labels: trivial
> Fix For: 2.9.0
>
> Attachments: HADOOP-12455.patch
>
>
> {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory 
> contains a file whose simple name contains a colon.
> The problem seem to be in the code currently at lines 258 and 257 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]:
> {noformat}
> 256:  // Set the child path based on the parent path.
> 257:  child.setPath(new Path(candidate.getPath(),
> 258:  child.getPath().getName()));
> {noformat}
> That last line should probably be:
> {noformat}
>   new Path(null, null, child.getPath().getName(;
> {noformat}
> 
> The bug in the current code is that:
> 1) {{child.getPath().getName()}} gets the simple name (last segment) of the 
> child {{Path}} as a _raw_ string (not necessarily the corresponding relative 
> _{{Path}}_ string), and
> 2) that raw string is passed as {{Path(Path, String)}}'s second argument, 
> which takes a _{{Path}}_ string.
> When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a 
> {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path 
> "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a 
> {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, 
> aborting the entire {{glob()}} call.
> 
> Adding the call to {{Path(String, String, String)}} does the equivalent of 
> converting the raw string "{{xxx:yyy}}" to the {{Path}} string 
> "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-12455) fs.Globber breaks on colon in filename; doesn't use Path's handling for colons

2015-12-16 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase reassigned HADOOP-12455:
---

Assignee: Rich Haase

> fs.Globber breaks on colon in filename; doesn't use Path's handling for colons
> --
>
> Key: HADOOP-12455
> URL: https://issues.apache.org/jira/browse/HADOOP-12455
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Reporter: Daniel Barclay (Drill)
>Assignee: Rich Haase
>
> {{org.apache.hadoop.fs.Globber.glob()}} breaks when a searched directory 
> contains a file whose simple name contains a colon.
> The problem seem to be in the code currently at lines 258 and 257 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java#L257]:
> {noformat}
> 256:  // Set the child path based on the parent path.
> 257:  child.setPath(new Path(candidate.getPath(),
> 258:  child.getPath().getName()));
> {noformat}
> That last line should probably be:
> {noformat}
>   new Path(null, null, child.getPath().getName(;
> {noformat}
> 
> The bug in the current code is that:
> 1) {{child.getPath().getName()}} gets the simple name (last segment) of the 
> child {{Path}} as a _raw_ string (not necessarily the corresponding relative 
> _{{Path}}_ string), and
> 2) that raw string is passed as {{Path(Path, String)}}'s second argument, 
> which takes a _{{Path}}_ string.
> When that raw string contains a colon (e.g., {{xxx:yyy}}), it looks like a 
> {{Path}} string that specifies a scheme ("{{xxx}}") and has a relative path 
> "{{yyy}}}"--but that combination isn't allowed, so trying to constructing a 
> {{Path}} with it (as {{Path(Path, String)}} does inside) throws an exception, 
> aborting the entire {{glob()}} call.
> 
> Adding the call to {{Path(String, String, String)}} does the equivalent of 
> converting the raw string "{{xxx:yyy}}" to the {{Path}} string 
> "{{./xxx:yyy}}", so the part before the colon is not taken as a scheme.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied

2015-06-10 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-12085:

Attachment: HADOOP-12085.patch

 DistCp DEBUG logs report -filters files as if they were going to be copied
 --

 Key: HADOOP-12085
 URL: https://issues.apache.org/jira/browse/HADOOP-12085
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Rich Haase
Assignee: Rich Haase
  Labels: easyfix
 Attachments: HADOOP-12085.patch


 HADOOP-1540 added the ability to exclude files for copy by comparing each 
 file to a set of Java regex patterns.  If the file matches a given patch it 
 will be excluded from the copy listing.  However, this patch failed to update 
 the debug logging to report when files are excluded from the copy listing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied

2015-06-10 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-12085:

Priority: Trivial  (was: Major)

 DistCp DEBUG logs report -filters files as if they were going to be copied
 --

 Key: HADOOP-12085
 URL: https://issues.apache.org/jira/browse/HADOOP-12085
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Rich Haase
Assignee: Rich Haase
Priority: Trivial
  Labels: easyfix
 Attachments: HADOOP-12085.patch


 HADOOP-1540 added the ability to exclude files for copy by comparing each 
 file to a set of Java regex patterns.  If the file matches a given patch it 
 will be excluded from the copy listing.  However, this patch failed to update 
 the debug logging to report when files are excluded from the copy listing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied

2015-06-10 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-12085:

Status: Patch Available  (was: Open)

[~jingzhao] I've added this patch as a fix to HADOOP-1540.  It's a very minor 
change, so I've not included any tests.  If you feel a test is appropriate for 
this change I will be happy to add it.

 DistCp DEBUG logs report -filters files as if they were going to be copied
 --

 Key: HADOOP-12085
 URL: https://issues.apache.org/jira/browse/HADOOP-12085
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Rich Haase
Assignee: Rich Haase
  Labels: easyfix

 HADOOP-1540 added the ability to exclude files for copy by comparing each 
 file to a set of Java regex patterns.  If the file matches a given patch it 
 will be excluded from the copy listing.  However, this patch failed to update 
 the debug logging to report when files are excluded from the copy listing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12085) DistCp DEBUG logs report -filters files as if they were going to be copied

2015-06-10 Thread Rich Haase (JIRA)
Rich Haase created HADOOP-12085:
---

 Summary: DistCp DEBUG logs report -filters files as if they were 
going to be copied
 Key: HADOOP-12085
 URL: https://issues.apache.org/jira/browse/HADOOP-12085
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Rich Haase
Assignee: Rich Haase


HADOOP-1540 added the ability to exclude files for copy by comparing each file 
to a set of Java regex patterns.  If the file matches a given patch it will be 
excluded from the copy listing.  However, this patch failed to update the debug 
logging to report when files are excluded from the copy listing.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-18 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.008.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch

 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-18 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.009.patch

[~jingzhao] I've updated the patch with the fixed you suggested.  Thanks, as 
always, for your help!!

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.009.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) Support file exclusion list in distcp

2015-05-18 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549149#comment-14549149
 ] 

Rich Haase commented on HADOOP-1540:


Thanks again [~jingzhao] and [~3opan] for the reviews!

 Support file exclusion list in distcp
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, distcp
 Attachments: HADOOP-1540.009.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-12 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540266#comment-14540266
 ] 

Rich Haase commented on HADOOP-1540:


[~jingzhao], would you please review the latest revision of this patch when you 
have time?  Thanks!

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.008.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-11 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.007.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.008.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-11 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.008.patch

Fixed whitespace and moved file read operations for CopyFilter into an 
initialize method.  

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.008.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.006.patch

Refactored the patch to do exclusion filtering while building the CopyListing.  
It turns out there is a method (SimpleCopyListing#shouldCopy) which always 
returns true.  I've added a couple of basic classes to perform the default 
(always true) behavior and a SimpleCopyFilter class, which uses a string 
compare to determine what should be excluded from the copy.  I think this 
design will be a bit more flexible in future, and it avoids having mappers 
which get a chunk of files to copy that should all be excluded.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch, HADOOP-1540.006.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Labels: bb2015-05-tbr patch  (was: patch)
Status: Patch Available  (was: Open)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch, bb2015-05-tbr
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch, HADOOP-1540.006.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535330#comment-14535330
 ] 

Rich Haase commented on HADOOP-1540:


[~3opan] Thanks for the comments!  

#1 and #3 I'll fix those space issues for the next rev of the patch.

#2 and #4 were added because checkstyle failed if I didn't make those changes.  
I'd have preferred to leave them alone.  Maybe someone can comment on how to 
avoid these kinds of checkstyle issues?

#5 You are absolutely right.  My initial pass at the patch used regex patterns. 
 I switched that logic only because at the time I was doing exclusion filtering 
in the CopyMapper and compiling lots of regex in every mapper was likely to be 
expensive with large filter lists.  Since we are only doing filtering while 
building the CopyListing it's probably not as big a deal to use regex, although 
I am open to alternate suggestions.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch, HADOOP-1540.006.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14535352#comment-14535352
 ] 

Rich Haase commented on HADOOP-1540:


Good idea.  I'll make those changes and resubmit.  Thanks again for the review!

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch, HADOOP-1540.006.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.005.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.007.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.006.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.007.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.007.patch

Incorporated suggested changes from [~3opan].
Renamed SimpleCopyFilter to RegexCopyFilter to better reflect what the class 
does.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch, HADOOP-1540.006.patch, HADOOP-1540.007.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.004.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.005.patch, HADOOP-1540.006.patch, 
 HADOOP-1540.007.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-08 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.003.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.004.patch, HADOOP-1540.005.patch, 
 HADOOP-1540.006.patch, HADOOP-1540.007.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-07 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Status: Open  (was: Patch Available)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-06 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Status: Open  (was: Patch Available)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-06 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.005.patch

Fixed issues described in [~jingzhao] comments.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-06 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Status: Patch Available  (was: Open)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-06 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Target Version/s: 2.8.0  (was: 3.0.0)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-06 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531848#comment-14531848
 ] 

Rich Haase commented on HADOOP-1540:


Found a bug in the patch.  The CopyCommitter doesn't know about exclusions so 
it will throw an exception when it tries to preserve permissions on a Path that 
was excluded.  Need to refactor the change to separate exclusion logic into a 
class that is usable by the CopyMapper and CopyCommiter.  Since the code has to 
be refactored to work correctly it makes sense to define an interface for 
exclusions.  

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: BB2015-05-TBR, patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch, 
 HADOOP-1540.005.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-05 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529487#comment-14529487
 ] 

Rich Haase commented on HADOOP-1540:


I'm going to review the patch again this evening and think about how an 
ExclusionListing interface would work.  I like the idea of a more extensible 
interface.  

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-05 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529472#comment-14529472
 ] 

Rich Haase commented on HADOOP-1540:


[~jingzhao] Thanks for the quick review!

1.  I'll updated the patch to  skip unboxing mapBandwidth.
2.  I think that may have been caused when I rebased the patch against 3.0.0.  
In any case, I will be sure to fix the patch so it doesn't delete tests!
3.  Will do.
4.  Will do.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-05 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.004.patch

Take 4.  I think I have all of the checkstyle issues fixed.  I want to object 
super strongly to an 80 char limit for line length.  It's been a long time 
since 80 chars was a reasonable line length.  

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-05 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Target Version/s: 3.0.0
   Fix Version/s: (was: 2.6.0)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Attachments: HADOOP-1540.003.patch, HADOOP-1540.004.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-05-04 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526883#comment-14526883
 ] 

Rich Haase commented on HADOOP-1540:


I'm not sure what to do about this final checkstyle warning:

./hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java:77:3:
 Method length is 196 lines (max allowed is 150).

The method in question is parse(), which was about 190 lines before I made my 
changes and is now 196 lines.  I can break up the parse method, but that seems 
like it would be more appropriate if this were a refactoring change, rather 
than a feature addition.  Can someone offer some suggestions for how I should 
handle this?

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.003.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-01 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.002.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.003.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-05-01 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.003.patch

Fixed checkstyle errors

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.002.patch, HADOOP-1540.003.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517921#comment-14517921
 ] 

Rich Haase commented on HADOOP-1540:


[~jingzhao]  Just finished rebasing against trunk and testing the patch.  



 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.001.patch, HADOOP-1540.branch-2.6.0.001.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.001.patch

rebased patch against trunk

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.001.patch, HADOOP-1540.branch-2.6.0.001.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.001.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.002.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518148#comment-14518148
 ] 

Rich Haase commented on HADOOP-1540:


Working on fixing the items Jenkins is complaining about.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.001.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.002.patch

This revision of the patch should fix findbugs/javac warnings.

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.001.patch, HADOOP-1540.002.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-28 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: (was: HADOOP-1540.branch-2.6.0.001.patch)

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.001.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-1540) distcp should support an exclude list

2015-04-27 Thread Rich Haase (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515182#comment-14515182
 ] 

Rich Haase commented on HADOOP-1540:


I have a patch for this JIRA that I've just started testing.  
https://github.com/richhaase/hadoop-patches/blob/master/HADOOP-1540.branch-2.6.0.001.patch

The patch adds a -exclusions arg option to distcp.  The argument is a file 
containing a list of Java Regex Patterns (one per line).  Each file that is to 
be copied will be compared the list of exclusion patterns.  IF an exclusion 
pattern is matched then the file will not be copied.  

Example CLI (running with a patched JAR on a Hortonworks HDP 2.2.4 cluster):

*$ export HADOOP_USER_CLASSPATH_FIRST=true; export 
HADOOP_CLASSPATH=/home/rhaase/hadoop-distcp-2.6.0-20150426160037.jar; mapred 
distcp -update -exclusions exclude.txt /user/hadoop/radio /user/rhaase/radio*
5/04/27 15:26:55 INFO tools.DistCp: Input Options: 
DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
copyStrategy='uniformsize', sourceFileListing=null, 
sourcePaths=[/user/hadoop/radio], targetPath=/user/rhaase/radio, 
targetPathExists=false, preserveRawXattrs=false, exclusionsFile='exclude.txt'}
...
15/04/27 15:42:27 INFO mapreduce.Job:  map 100% reduce 0%
15/04/27 15:42:27 INFO mapreduce.Job: Job job_1429896015201_0035 completed 
successfully
15/04/27 15:42:27 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=2392499
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=358894362945
HDFS: Number of bytes written=358893418844
HDFS: Number of read operations=3214
HDFS: Number of large read operations=0
HDFS: Number of write operations=633
Job Counters
Launched map tasks=21
Other local map tasks=21
Total time spent by all maps in occupied slots (ms)=4297461
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4297461
Total vcore-seconds taken by all map tasks=4297461
Total megabyte-seconds taken by all map tasks=4400600064
Map-Reduce Framework
Map input records=4296
Map output records=0
Input split bytes=2457
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=4573
CPU time spent (ms)=2571060
Physical memory (bytes) snapshot=10379874304
Virtual memory (bytes) snapshot=56655720448
Total committed heap usage (bytes)=43711463424
File Input Format Counters
Bytes Read=941644
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=358893418844
*BYTESEXCLUDED=1407553620118*
BYTESEXPECTED=358893418844
COPY=322
*EXCLUDED=3974*




 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Reporter: Senthil Subramanian
Priority: Minor

 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-27 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Fix Version/s: 2.6.0
 Assignee: Rich Haase
   Labels: patch  (was: )
Affects Version/s: 2.6.0
   Status: Patch Available  (was: Reopened)

Submitting patch for Jenkins test run.  I think there is a bug in the way I am 
handling the argument to -exclusions.  Files only can be read from the files 
system configured in hdfs-site.xml in the tests I've been running on an actual 
cluster.  

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-1540) distcp should support an exclude list

2015-04-27 Thread Rich Haase (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rich Haase updated HADOOP-1540:
---
Attachment: HADOOP-1540.branch-2.6.0.001.patch

 distcp should support an exclude list
 -

 Key: HADOOP-1540
 URL: https://issues.apache.org/jira/browse/HADOOP-1540
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.6.0
Reporter: Senthil Subramanian
Assignee: Rich Haase
Priority: Minor
  Labels: patch
 Fix For: 2.6.0

 Attachments: HADOOP-1540.branch-2.6.0.001.patch


 There should be a way to ignore specific paths (eg: those that have already 
 been copied over under the current srcPath). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)