[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825550#comment-15825550 ] Yongjun Zhang commented on HADOOP-8065: --- Hi Guys, Thanks for the work at HADOOP-13114, which I just commented. About HADOOP-8065, {quote} We would like compress the data while transferring from our source system to target system. One way to do this is to write a map/reduce job to compress that after/before being transferred. This looks inefficient. Since distcp already reading writing data it would be better if it can accomplish while doing this. {quote} Compressing data while transferring data means we need to skip checksum comparison during the transfer. Since multiple blocks maybe compressed into a single block, the checksum can only be possibly verified after decompressing the data. However, due to the existence of variable block size, this could be error prone. We could possibly implement something like DFSOutputStreamWithCompression, that compress input data before writing out, that can be used by not only distcp with regard to this jira, but also other tools. Thanks. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065.005.patch, HADOOP-8065.006.patch, > HADOOP-8065-trunk_2015-11-03.patch, HADOOP-8065-trunk_2015-11-04.patch, > HADOOP-8065-trunk_2016-04-29-4.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677449#comment-15677449 ] Yongjun Zhang commented on HADOOP-8065: --- Thanks a lot [~raviprak], I will take a look at HADOOP-13114 asap. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, HADOOP-8065.006.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675308#comment-15675308 ] Ravi Prakash commented on HADOOP-8065: -- Hi Yongjun! Thanks for rebasing the patch and your polishing touches. I think HADOOP-13114 might be the more appropriate JIRA for these changes (which Suraj kindly filed at my request earlier.) since this patch does not compress *during* transfer; only after transfer and before writing to HDFS. - {{getCompressionCodcec}} has the same typo I pointed out to Suraj. He did post updated patches on HADOOP-13114. I apologize for neglecting to review those patches despite Suraj's requests. - {{getCompressionCodcec}} also uses ReflectionUtils. I don't know if it'd be better to use [this pattern|https://github.com/apache/hadoop/blob/b4f1971ff1dd578353036d7a123fe83c27c1e803/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileInputFormat.java#L159] instead? - We're still not using a CodecPool like I suggested earlier. The patch in HADOOP-13114 actually is. Let me rebase and upload that. Could you please take a look at that? > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, HADOOP-8065.006.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670862#comment-15670862 ] Yongjun Zhang commented on HADOOP-8065: --- Hi [~raviprak], One thing came to my mind is, whether we should automatically add postfix (based on the codec) to the target file names, # when either distcp src and tgt is directory. # when both src and tgt are files, user should be responsible to add the postfix. This adds some complexity.. Another option is, we don't change the target file name, and have a command like "file" in unix/linux. {code} FILE(1) BSD General Commands Manual FILE(1) NAME file - determine file type SYNOPSIS file [-bchikLNnprsvz0] [--apple] [--mime-encoding] [--mime-type] [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ... file -C [-m magicfiles] file [--help] {code} This alternative seems simpler and cleaner and more robust. Or we can extend {code} hadoop fs -getfattr [-R] -n name | -d [-e en] {code} to report file type etc. Thanks. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, HADOOP-8065.006.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15669131#comment-15669131 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 38s{color} | {color:orange} root: The patch generated 5 new + 124 unchanged - 9 fixed = 129 total (was 133) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} hadoop-tools_hadoop-distcp generated 1 new + 49 unchanged - 0 fixed = 50 total (was 49) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 57s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 11s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-8065 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12839088/HADOOP-8065.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4647a6501c31 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 61c0bed | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11074/artifact/patchprocess/diff-checkstyle-root.txt | | javadoc | https://builds.apache.org/job/PreCommit-HADOOP-Build/11074/artifact/patchprocess/diff-javadoc-javadoc-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11074/testReport/ | | modules | C:
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668944#comment-15668944 ] Yongjun Zhang commented on HADOOP-8065: --- Sure [~raviprak]! rev05 is a pure rebase. I just did a new rev 06 with some polishing I wanted to do: # use constants instead of hardcoded ones. # use DistCp's own set of configuration instead of the FileOutputFormat ones. This would separate distcp from other mapreduce job's config. # let DistCp fail before getting to mapper, if the compression is enabled with invalid codec # added a negative test Would you please take a look at it? Thanks. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, HADOOP-8065.006.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668015#comment-15668015 ] Ravi Prakash commented on HADOOP-8065: -- Thanks for rebasing Yongjun! I'll take a look. Does it look good to you? > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667512#comment-15667512 ] Hadoop QA commented on HADOOP-8065: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 13 new + 57 unchanged - 1 fixed = 70 total (was 58) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 7s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HADOOP-8065 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12838991/HADOOP-8065.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 539504ef1e67 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7ffb994 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/11067/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/11067/testReport/ | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/11067/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667455#comment-15667455 ] Yongjun Zhang commented on HADOOP-8065: --- HI [~snayakm], not sure whether you are still working on this issue, I just uploaded a rebased version. Hi [~raviprak], thanks for your earlier review, any thinking about pushing this forward? Thanks much. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > HADOOP-8065.005.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623338#comment-15623338 ] Yongjun Zhang commented on HADOOP-8065: --- HI [~snayakm], Wonder if you are available to continue on this issue? thanks much. --Yongjun > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15577027#comment-15577027 ] Hadoop QA commented on HADOOP-8065: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 13 new + 61 unchanged - 1 fixed = 74 total (was 62) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 2s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HADOOP-8065 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801507/HADOOP-8065-trunk_2016-04-29-4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 50d7a1f4c7ca 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 30bb197 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/10801/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/10801/testReport/ | | modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/10801/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576966#comment-15576966 ] Yongjun Zhang commented on HADOOP-8065: --- HI [~snayakm], Thanks for your work here and thanks [~raviprak] for the review so far. I quickly browsed the patch, and have a couple of comments: * {{mapreduce.output.fileoutputformat.compress}} and {{mapreduce.output.fileoutputformat.compress.codec}} are defined in FileOutputFormat.java, we should use the constant defined there at the multiple places this patch touched. * May I know what kind of tests you have done for the patch? Thanks. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275173#comment-15275173 ] Suraj Nayak commented on HADOOP-8065: - Thanks [~raviprak]. I have created JIRA [HADOOP-13114|https://issues.apache.org/jira/browse/HADOOP-13114] and added you as watcher. On your comment on {{codec}}, you are right, I was in mid of extracting the default codec extension that needed to be appended to the end of the file. Will upload the patch once my local build gives +1. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273160#comment-15273160 ] Ravi Prakash commented on HADOOP-8065: -- Thanks Suraj! In CopyMapper, you are declaring {{codec}}, assigning it a value and then never using it. Are you sure you need those changes? Maybe you are missing some part of the patch? I am looking at [HADOOP-8065-trunk_2016-04-29-4.patch|https://issues.apache.org/jira/secure/attachment/12801507/HADOOP-8065-trunk_2016-04-29-4.patch] To enable compression during transit is a MUCH bigger Epic. We may have to change [FileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L769], and [BlockSender|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java] amongst others (on the datanode side). A lot more people will also have an opinion on it and its probably a multi-month effort. Also, striped blocks may make it more complicated. People may argue that users should compress and decompress at the application level. It'd just be way more complicated than what we are trying to do here. I suggest we tackle that after this problem > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272016#comment-15272016 ] Suraj Nayak commented on HADOOP-8065: - [~raviprak] : It will be really helpful if can you provide me some hints how to implement the compression *during transit*? Is it after {{context.write()}} or before ? > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269857#comment-15269857 ] Ravi Prakash commented on HADOOP-8065: -- Thanks for the patch [~snayakm]! Here are some of my thoughts: # What users seem to want, is to be able to compress data *during transit*. {color:red}*This patch does not enable compression of data during transit.*{color} Distcp is simply an MR job where maps are reading from a "source" . If the source does not support compressing the data before putting it on the network, I don't see how we could achieve what these users want. # *We are simply enabling users to avoid a post-processing step to compress the data they have already transferred*. This too is a noble goal if it makes the lives of users easier IMHO. It also reduces the amount of space needed on the target filesystem. We should rewrite the JIRA summary to be more explicit if that is the stated goal. Reviewing the patch: # Do you really need the changes in {{CopyMapper}}? # Nit: {{getCompressionCodcec}} is misspelt # Instead of {code} e.printStackTrace(); LOG.error("Compression class " + compressionCodecClass + " not found in classpath");{code} you can simply pass {{e}} as a second argument to the LOG.error method. # With this patch, we'll end up creating an instance of a Codec for every file. Do you think we could utilize something like {{org.apache.hadoop.io.compress.CodecPool}}? # Perhaps we can add an option {{-compressOutput}} which defaults to some codec? # Although its conceivable that we may want to decompress before writing to the target filesystem, we can punt that to another JIRA. Thanks for your efforts! :-) > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Suraj Nayak >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, HADOOP-8065-trunk_2016-04-29-4.patch, > patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264685#comment-15264685 ] Hadoop QA commented on HADOOP-8065: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 60 unchanged - 1 fixed = 60 total (was 61) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 29s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 19s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 30m 10s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801507/HADOOP-8065-trunk_2016-04-29-4.patch | | JIRA Issue | HADOOP-8065 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 07e72f6efdae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264544#comment-15264544 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 new + 59 unchanged - 1 fixed = 59 total (was 60) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_92 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 26s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_92. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 21s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 52s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801491/HADOOP-8065-trunk_2016-04-29-3.patch | | JIRA Issue | HADOOP-8065 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux a600558fc41e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263892#comment-15263892 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} hadoop-tools/hadoop-distcp: The patch generated 2 new + 61 unchanged - 0 fixed = 63 total (was 61) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 15s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 5s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 22s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801430/HADOOP-8065-trunk_2016-04-29-2.patch | | JIRA Issue | HADOOP-8065 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux cb750ca019f9 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263809#comment-15263809 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} hadoop-tools/hadoop-distcp: The patch generated 13 new + 61 unchanged - 0 fixed = 74 total (was 61) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 9 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s {color} | {color:red} hadoop-tools_hadoop-distcp-jdk1.8.0_91 with JDK v1.8.0_91 generated 1 new + 50 unchanged - 0 fixed = 51 total (was 50) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 21s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 17s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 7s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:cf2ee45 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12801420/HADOOP-8065-trunk_2016-04-29.patch | | JIRA Issue | HADOOP-8065 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 15a8ac959fa3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool |
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263357#comment-15263357 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} HADOOP-8065 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12770646/HADOOP-8065-trunk_2015-11-04.patch | | JIRA Issue | HADOOP-8065 | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/9222/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Stephen Veiss >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263353#comment-15263353 ] Suraj Nayak M commented on HADOOP-8065: --- The patch attached is for old codebase. I have a working version against hadoop 2.4.0 codebase, which works with only overwrite option. Does that help ? I'll need to put some testcase before I upload the patch. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Stephen Veiss >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157863#comment-15157863 ] Ravi Prakash commented on HADOOP-8065: -- Thanks for the initiative Stephen! Could you please rebase the patch against current trunk and ping me? I'm sorry you haven't received the necessary attention. I'll try to fix that. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Assignee: Stephen Veiss >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150957#comment-15150957 ] Hadoop QA commented on HADOOP-8065: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s {color} | {color:red} hadoop-tools/hadoop-distcp: patch generated 5 new + 140 unchanged - 1 fixed = 145 total (was 141) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 39s {color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 56s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12770646/HADOOP-8065-trunk_2015-11-04.patch | | JIRA Issue | HADOOP-8065 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2db8330588f6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fd1befb |
[jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
[ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108043#comment-15108043 ] David Ongaro commented on HADOOP-8065: -- When will this finally get merged? I was delighted to see that the Amazon s3distcp supports output compression (http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/UsingEMR_s3distcp.html) but got disappointed to see that it's not supported by the standard distcp included in hadoop even though it can output to s3. I think copying the data to a different storage system is a perfectly valid (probably even common?) usecase, to have the data compressed, even though it doesn't need to be compressed on hdfs. It's just an unnecessary step and a waste of resources if we have to duplicate the data on hdfs in compressed form before the distcp. > distcp should have an option to compress data while copying. > > > Key: HADOOP-8065 > URL: https://issues.apache.org/jira/browse/HADOOP-8065 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 0.20.2 >Reporter: Suresh Antony >Priority: Minor > Labels: distcp > Fix For: 0.20.2 > > Attachments: HADOOP-8065-trunk_2015-11-03.patch, > HADOOP-8065-trunk_2015-11-04.patch, patch.distcp.2012-02-10 > > > We would like compress the data while transferring from our source system to > target system. One way to do this is to write a map/reduce job to compress > that after/before being transferred. This looks inefficient. > Since distcp already reading writing data it would be better if it can > accomplish while doing this. > Flip side of this is that distcp -update option can not check file size > before copying data. It can only check for the existence of file. > So I propose if -compress option is given then file size is not checked. > Also when we copy file appropriate extension needs to be added to file > depending on compression type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)