[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813083#comment-15813083
 ] 

Ravi Prakash edited comment on HADOOP-13114 at 1/9/17 11:20 PM:
----------------------------------------------------------------

Here's a rebase of the patch from Suraj and Yongjun. To try it out, you could 
use this command: 
{code}
hadoop distcp 
-Ddistcp.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec
 --compressoutput /input /output
{code}


was (Author: raviprak):
Here's rebase of the patch from Suraj and Yongjun. To try it out, you could use 
this command: 
{code}
hadoop distcp 
-Ddistcp.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.BZip2Codec
 --compressoutput /input /output
{code}

> DistCp should have option to compress data on write
> ---------------------------------------------------
>
>                 Key: HADOOP-13114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13114
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>            Reporter: Suraj Nayak
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>         Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to