[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982663#comment-15982663 ] Hadoop QA commented on HADOOP-13114: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 10s{color} | {color:red} HADOOP-13114 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13114 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846433/HADOOP-13114.06.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/12176/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, > HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, > HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982488#comment-15982488 ] Fei Hui commented on HADOOP-13114: -- [~snayakm] could you please upload a patch for branch-2? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, > HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, > HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825535#comment-15825535 ] Yongjun Zhang commented on HADOOP-13114: Thanks [~raviprak] for the patch and all for the discussion here. One possible use of only compressing data at write is, we can save disk space at target side. Imagine if the target is a backup cluster that need to save space. Yes, this possibly can be implemented with a tool to do the compression after distcp, but that means the target need to store both the original files and compressed files before the originals are deleted. I have some thoughts about HADOOP-8065, will put there shortly. Thanks. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, > HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, > HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822651#comment-15822651 ] Joep Rottinghuis commented on HADOOP-13114: --- I have similar concerns to the ones raised, a copy shouldn't change the format. It seems that the patch doesn't allow to use both -update and compress at the same time. What if the copy was done first with -compress, then a user wants to switch to -update and then changes their job to remove the -compress and switch to the -update. It will result in all files getting copied again right? In the current approach the compression seems to happen on the write-side. That means that for copies across expensive network (such as cross-dc copies) the data still travels uncompressed first. Wouldn't it make sense to create wrapper functionality to first compress on the source, then use regular distcp? Possibly the compressed temporary data could be in a /tmp directory structure. Alternatively one can still distcp first (to a tmp location) and then compress if that is desired. The advantage to keep the compression step separate from the distcp step is that one could additionally collapse files together into fewer files if possible. We're finding that our users already have a hard time dealing with the intricacies of interactions of various distcp flags (-atomic, -update, etc.). > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, > HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, > HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819365#comment-15819365 ] Koji Noguchi commented on HADOOP-13114: --- bq. Could you please elucidate your concern if its not that? My point is, this command won't be useful unless the compressed outputs are directly readable by hadoop jobs. Avro, Orc, RCFile, SequenceFile etc and other common file formats all have their own ways of compressing and simply gzip/bzip-ing the entire files won't do any good. Worse, I don't think the patch provides a way to uncompress them back. bq. but that means we'd make assumptions about Hadoop's use cases And I'd say you're assuming users would only call this distcp+compress on text files only. Files with other fileformat would become unreadable (until uncompressed back). I agree with Nathan on the naming. If the command is called {{dist-text-compress}}, then I'll have no concerns. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816356#comment-15816356 ] Ravi Prakash commented on HADOOP-13114: --- Thanks Koji! I was under the impression that even binary files could be compressed quite well. For e.g. if I compress /usr/bin/xsane (a binary file) {code} [raviprak@ravi ~]$ ls -alh xsane.gz -rwxr-xr-x 1 raviprak raviprak 298K Jan 10 11:06 xsane.gz [raviprak@ravi ~]$ ls -alh /usr/bin/xsane -rwxr-xr-x 1 root root 744K Feb 5 2016 /usr/bin/xsane {code} The question is how many "binary" files we expect to be on HDFS, but that means we'd make assumptions about Hadoop's use cases and I'm not sure I want to hazard that. I'm sorry if I misunderstand you. Could you please elucidate your concern if its not that? Thanks Nathan! I am ambivalent about this myself. Ideally we'd want to compress during transit (like {{rsync -z}}), but this JIRA was split out of that desire (from HADOOP-8065). For a variety of reasons HADOOP-8065 has been requested by a lot of _our_ customers (in addition to the hadoop users you can see in the voters and watchers list.) Also, a few first-time contributors went above and beyond on this JIRA. bq. What happens if we run the command with compression twice? distcp a->b, then b->c? I'm assuming c is a compressed version of b which is a compressed version of a. In order to read we'd have to unwind both layers of compression. Seems strange and really easy to accidentally have this happen. You are right that compressed files would be nested, one inside the other. Compression tools would do similar nesting, won't they? So I'm not sure it can be helped. And if I had checked the compression status, I'm sure someone will pipe up and say that I should have been nesting ;-) Perhaps yet another flag? bq. Obvious question is: "if it's valuable to compress, why wasn't it compressed in the first place?" In my experience, some times the source hadoop cluster is not in the control of the copier, or has a lot more capacity (and so compression there is not a concern). Sometimes the source is written by IoT objects into a staging area, and rather than have a separate job that compresses data, it'd be helpful to combine the copy with the compression. bq. Just the name bothers me a bit. copy commands don't normally transform data, but this one would. Having said that, I do feel this argument is particularly compelling. I am not sure if this would be breaking precedent considering there is {{--append}} which is not exactly a "copy" either, but I do agree with your concern. For now I will stop work on this JIRA unless I hear from a few more diverse viewpoints. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815294#comment-15815294 ] Nathan Roberts commented on HADOOP-13114: - Sorry for jumping in late. I tend to agree this seems like it might be outside the scope of distcp. I understand the desire to support this capability but it seems like the use-cases get strange if we fold it into distcp itself. It might be as simple as creating a new command: "distcompress" or something similar, which could share exactly the same code-base as distcp but only has this new capability in that mode. Some of the worries I have with having it in distcp are: - Just the name bothers me a bit. copy commands don't normally transform data, but this one would. - What happens if we run the command with compression twice? distcp a->b, then b->c? I'm assuming c is a compressed version of b which is a compressed version of a. In order to read we'd have to unwind both layers of compression. Seems strange and really easy to accidentally have this happen. - I'm assuming CRC checks have to be disabled when doing this. Did we force the user to disable CRC checks by providing the necessary option or did we just do it automatically? If automatic, should WARN them this happened. - Obvious question is: "if it's valuable to compress, why wasn't it compressed in the first place?" > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815124#comment-15815124 ] Koji Noguchi commented on HADOOP-13114: --- bq. I guess it'd be useful for any files which are compressible, right? I'm probably missing something here. Besides from text files, is there any other file format that can benefit from this distcp+compression? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813177#comment-15813177 ] Ravi Prakash commented on HADOOP-13114: --- Thanks for your comment Koji! I guess it'd be useful for any files which are compressible, right? And also the target HDFS can have less free space. Are you thinking there may be downsides? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, > HADOOP-13114.06.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813173#comment-15813173 ] Hadoop QA commented on HADOOP-13114: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 2 new + 178 unchanged - 0 fixed = 180 total (was 178) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 41s{color} | {color:red} hadoop-distcp in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestDistCpCompression | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13114 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12846433/HADOOP-13114.06.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux b2f024a91ddf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 91bf504 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/whitespace-eol.txt | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/whitespace-tabs.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/testReport/ | | modules |
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684893#comment-15684893 ] Koji Noguchi commented on HADOOP-13114: --- Sorry for joining late on this jira but this feature only seems to make sense for compressing text files. Isn't the use case too narrow to be part of the general distcp tool ? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684765#comment-15684765 ] Yongjun Zhang commented on HADOOP-13114: HI [~snayakm] and [~raviprak], thanks a lot for your earlier work here! HI Ravi, I did a review of latest rev 5 you posted, some comments here: 1. All items listed in https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15668944=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15668944 * use constants instead of hardcoded ones. * use DistCp's own set of configuration instead of the FileOutputFormat ones. This would separate distcp from other mapreduce job's config. * let DistCp fail before getting to mapper, if the compression is enabled with invalid codec * added a negative test which I did in the latest patch version in HADOOP-8065. 2. Think about using extended attributes to address https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15670862=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15670862 3. Nits: misnomer in {{private boolean outputCodec = false;}}, which meant to be {{compressOutput}} I think 2 can be deferred to later in a separate jira. What do you think? Thanks. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675430#comment-15675430 ] Hadoop QA commented on HADOOP-13114: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 1 new + 160 unchanged - 0 fixed = 161 total (was 160) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 8s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-13114 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12839476/HADOOP-13114.05.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c1e372d245b1 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f05a9ce | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/testReport/ | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL:
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667460#comment-15667460 ] Hadoop QA commented on HADOOP-13114: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-13114 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-13114 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12803827/HADOOP-13114-trunk_2016-05-12-1.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11068/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333079#comment-15333079 ] Suraj Nayak commented on HADOOP-13114: -- [~raviprak] : Any improvements/suggestions/review on this patch ? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283079#comment-15283079 ] Suraj Nayak commented on HADOOP-13114: -- JIRA was not accepting comments when I uploaded the latest patch with {{CodecPool}} changes. Adding the details of Jenkins build here with this comment : Jenkins Console output Link : [https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/console] Jenkins output : +1 overall | Vote | Subsystem | Runtime | Comment | 0 |reexec | 0m 13s| Docker mode activated. | +1 | @author | 0m 0s | The patch does not contain any @author | ||| tags. | +1 |test4tests | 0m 0s | The patch appears to include 2 new or | ||| modified test files. | +1 |mvninstall | 7m 1s | trunk passed | +1 | compile | 0m 14s| trunk passed with JDK v1.8.0_91 | +1 | compile | 0m 17s| trunk passed with JDK v1.7.0_95 | +1 |checkstyle | 0m 17s| trunk passed | +1 | mvnsite | 0m 22s| trunk passed | +1 |mvneclipse | 0m 15s| trunk passed | +1 | findbugs | 0m 28s| trunk passed | +1 | javadoc | 0m 12s| trunk passed with JDK v1.8.0_91 | +1 | javadoc | 0m 15s| trunk passed with JDK v1.7.0_95 | +1 |mvninstall | 0m 17s| the patch passed | +1 | compile | 0m 13s| the patch passed with JDK v1.8.0_91 | +1 | javac | 0m 13s| the patch passed | +1 | compile | 0m 15s| the patch passed with JDK v1.7.0_95 | +1 | javac | 0m 15s| the patch passed | +1 |checkstyle | 0m 14s| the patch passed | +1 | mvnsite | 0m 20s| the patch passed | +1 |mvneclipse | 0m 11s| the patch passed | +1 |whitespace | 0m 0s | The patch has no whitespace issues. | +1 | findbugs | 0m 36s| the patch passed | +1 | javadoc | 0m 10s| the patch passed with JDK v1.8.0_91 | +1 | javadoc | 0m 12s| the patch passed with JDK v1.7.0_95 | +1 | unit | 8m 40s| hadoop-distcp in the patch passed with | ||| JDK v1.8.0_91. | +1 | unit | 7m 55s| hadoop-distcp in the patch passed with | ||| JDK v1.7.0_95. | +1 |asflicense | 0m 17s| The patch does not generate ASF License | ||| warnings. | || 29m 51s | || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12803827/HADOOP-13114-trunk_2016-05-12-1.patch | | JIRA Issue | HADOOP-13114 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 62e2be2ea3c4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fa440a3 | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 | | findbugs | v3.0.0 | | JDK v1.7.0_95 Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/testReport/ | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0-alpha1 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 3.0.0-alpha1 > > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, > HADOOP-13114-trunk_2016-05-12-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278033#comment-15278033 ] Hadoop QA commented on HADOOP-13114: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 50s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 6s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 35m 14s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12803202/HADOOP-13114-trunk_2016-05-10-1.patch | | JIRA Issue | HADOOP-13114 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c486469f6985 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 87f5e35 | | Default Java | 1.7.0_95 | | Multi-JDK versions |
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277629#comment-15277629 ] Suraj Nayak commented on HADOOP-13114: -- With the uploaded patch [HADOOP-13114-trunk_2016-05-08-1.patch|https://issues.apache.org/jira/secure/attachment/12802907/HADOOP-13114-trunk_2016-05-08-1.patch] there is a issue with directory naming. The change was intended to change the file name(append the codec file extensioin), but the patch is changing the directory name itself instead of file names. Working on patch to fix it. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 3.0.0 > > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, > HADOOP-13114-trunk_2016-05-08-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275837#comment-15275837 ] Hadoop QA commented on HADOOP-13114: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 180 unchanged - 1 fixed = 180 total (was 181) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 37s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 9s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12802907/HADOOP-13114-trunk_2016-05-08-1.patch | | JIRA Issue | HADOOP-13114 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a4280501436c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275464#comment-15275464 ] Hadoop QA commented on HADOOP-13114: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 180 unchanged - 1 fixed = 180 total (was 181) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s {color} | {color:red} hadoop-tools_hadoop-distcp-jdk1.8.0_91 with JDK v1.8.0_91 generated 1 new + 50 unchanged - 0 fixed = 51 total (was 50) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 15s {color} | {color:red} hadoop-distcp in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 47s {color} | {color:red} hadoop-distcp in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 54s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.tools.TestOptionsParser | | JDK v1.7.0_95 Failed junit tests | hadoop.tools.TestOptionsParser | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12802853/HADOOP-13114-trunk_2016-05-07-1.patch | | JIRA Issue | HADOOP-13114 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275375#comment-15275375 ] Suraj Nayak commented on HADOOP-13114: -- [~raviprak] : Regarding your [comment|https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15269857=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15269857] on reusing codec instead of creating new for each file, here are my thoughts and questions: * {{org.apache.hadoop.io.compress.CompressionCodec.Util}} has a static Util class which consists of {{createOutputStreamWithCodecPool}} method. Do you think its good idea to change the class and method to public ? * I thought of copying the {{createOutputStreamWithCodecPool}} method code into {{DistCpUtils}}, but that will result in code duplication. What would you suggest for making this code reusable? > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 3.0.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write
[ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275171#comment-15275171 ] Suraj Nayak commented on HADOOP-13114: -- This JIRA is similar to HADOOP-8065. HADOOP-8065 aims to compress data during transit which is a huge effort. This JIRA is simplified to enable to user to compress data when the data lands on target filesystem. > DistCp should have option to compress data on write > --- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Suraj Nayak >Assignee: Suraj Nayak >Priority: Minor > Fix For: 3.0.0 > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified > compression format. This avoids one hop of compressing data after transfer. > Backup strategies to different cluster also get benefit of saving one IO > operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to > {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec > extension to indicate the file is compressed. Thus users can be aware of what > codec was used to compress the data. > This JIRA is similar to > [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065]. > [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065] aims to > compress data *during transit* which is a huge effort. This JIRA is > simplified to enable to user to compress data when the data lands on target > filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org