subject:"\[jira\] \[Commented\] \(HADOOP\-13114\) DistCp should have option to compress data on write"

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-04-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982663#comment-15982663
 ] 

Hadoop QA commented on HADOOP-13114:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} HADOOP-13114 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13114 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846433/HADOOP-13114.06.patch 
|
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12176/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, 
> HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, 
> HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-04-25 Thread Fei Hui (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982488#comment-15982488
 ] 

Fei Hui commented on HADOOP-13114:
--

[~snayakm] could you please upload a patch for branch-2?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, 
> HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, 
> HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-16 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825535#comment-15825535
 ] 

Yongjun Zhang commented on HADOOP-13114:


Thanks [~raviprak] for the patch and all for the discussion here.

One possible use of only compressing data at write is, we can save disk space 
at target side. Imagine if the target is a backup cluster that need to save 
space. 

Yes, this possibly can be implemented with a tool to do the compression after 
distcp, but that means the target need to store both the original files and 
compressed files before the originals are deleted.

I have some thoughts about HADOOP-8065, will put there shortly.

Thanks.


> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, 
> HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, 
> HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-13 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822651#comment-15822651
 ] 

Joep Rottinghuis commented on HADOOP-13114:
---

I have similar concerns to the ones raised, a copy shouldn't change the format.

It seems that the patch doesn't allow to use both -update and compress at the 
same time. What if the copy was done first with -compress, then a user wants to 
switch to -update and then changes their job to remove the -compress and switch 
to the -update. It will result in all files getting copied again right?

In the current approach the compression seems to happen on the write-side. That 
means that for copies across expensive network (such as cross-dc copies) the 
data still travels uncompressed first.
Wouldn't it make sense to create wrapper functionality to first compress on the 
source, then use regular distcp? Possibly the compressed temporary data could 
be in a /tmp directory structure. Alternatively one can still distcp first (to 
a tmp location) and then compress if that is desired. The advantage to keep the 
compression step separate from the distcp step is that one could additionally 
collapse files together into fewer files if possible.

We're finding that our users already have a hard time dealing with the 
intricacies of interactions of various distcp flags (-atomic, -update, etc.).

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114.05.patch, HADOOP-13114.06.patch, 
> HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch, 
> HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-11 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819365#comment-15819365
 ] 

Koji Noguchi commented on HADOOP-13114:
---

bq. Could you please elucidate your concern if its not that?

My point is, this command won't be useful unless the compressed outputs are 
directly readable by hadoop jobs.
Avro, Orc, RCFile, SequenceFile etc and other common file formats all have 
their own ways of compressing and simply gzip/bzip-ing the entire files won't 
do any good.
Worse, I don't think the patch provides a way to uncompress them back.

bq.  but that means we'd make assumptions about Hadoop's use cases

And I'd say you're assuming users would only call this distcp+compress on text 
files only.
Files with other fileformat would become unreadable (until uncompressed back).


I agree with Nathan on the naming. If the command is called 
{{dist-text-compress}}, then I'll have no concerns.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-10 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816356#comment-15816356
 ] 

Ravi Prakash commented on HADOOP-13114:
---

Thanks Koji! I was under the impression that even binary files could be 
compressed quite well. For e.g. if I compress /usr/bin/xsane (a binary file)
{code}
[raviprak@ravi ~]$ ls -alh xsane.gz 
-rwxr-xr-x 1 raviprak raviprak 298K Jan 10 11:06 xsane.gz
[raviprak@ravi ~]$ ls -alh /usr/bin/xsane
-rwxr-xr-x 1 root root 744K Feb  5  2016 /usr/bin/xsane
{code}
The question is how many "binary" files we expect to be on HDFS, but that means 
we'd make assumptions about Hadoop's use cases and I'm not sure I want to 
hazard that. I'm sorry if I misunderstand you. Could you please elucidate your 
concern if its not that?

Thanks Nathan! I am ambivalent about this myself. Ideally we'd want to compress 
during transit (like {{rsync -z}}), but this JIRA was split out of that desire 
(from HADOOP-8065). For a variety of reasons HADOOP-8065 has been requested by 
a lot of _our_ customers (in addition to the hadoop users you can see in the 
voters and watchers list.) Also, a few first-time contributors went above and 
beyond on this JIRA.

bq. What happens if we run the command with compression twice? distcp a->b, 
then b->c? I'm assuming c is a compressed version of b which is a compressed 
version of a. In order to read we'd have to unwind both layers of compression. 
Seems strange and really easy to accidentally have this happen.
You are right that compressed files would be nested, one inside the other. 
Compression tools would do similar nesting, won't they? So I'm not sure it can 
be helped. And if I had checked the compression status, I'm sure someone will 
pipe up and say that I should have been nesting ;-) Perhaps yet another flag?

bq. Obvious question is: "if it's valuable to compress, why wasn't it 
compressed in the first place?"
In my experience, some times the source hadoop cluster is not in the control of 
the copier, or has a lot more capacity (and so compression there is not a 
concern). Sometimes the source is written by IoT objects into a staging area, 
and rather than have a separate job that compresses data, it'd be helpful to 
combine the copy with the compression. 

bq. Just the name bothers me a bit. copy commands don't normally transform 
data, but this one would.
Having said that, I do feel this argument is particularly compelling. I am not 
sure if this would be breaking precedent considering there is {{--append}} 
which is not exactly a "copy" either, but I do agree with your concern.

For now I will stop work on this JIRA unless I hear from a few more diverse 
viewpoints.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-10 Thread Nathan Roberts (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815294#comment-15815294
 ] 

Nathan Roberts commented on HADOOP-13114:
-

Sorry for jumping in late. I tend to agree this seems like it might be outside 
the scope of distcp. I understand the desire to support this capability but it 
seems like the use-cases get strange if we fold it into distcp itself. It might 
be as simple as creating a new command: "distcompress" or something similar, 
which could share exactly the same code-base as distcp but only has this new 
capability in that mode. Some of the worries I have with having it in distcp 
are:
- Just the name bothers me a bit. copy commands don't normally transform data, 
but this one would. 
- What happens if we run the command with compression twice? distcp a->b, then 
b->c? I'm assuming c is a compressed version of b which is a compressed version 
of a. In order to read we'd have to unwind both layers of compression. Seems 
strange and really easy to accidentally have this happen.
- I'm assuming CRC checks have to be disabled when doing this. Did we force the 
user to disable CRC checks by providing the necessary option or did we just do 
it automatically? If automatic, should WARN them this happened.
- Obvious question is: "if it's valuable to compress, why wasn't it compressed 
in the first place?" 
  

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-10 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815124#comment-15815124
 ] 

Koji Noguchi commented on HADOOP-13114:
---

bq. I guess it'd be useful for any files which are compressible, right? 

I'm probably missing something here.
Besides from text files, is there any other file format that can benefit from 
this distcp+compression?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-09 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813177#comment-15813177
 ] 

Ravi Prakash commented on HADOOP-13114:
---

Thanks for your comment Koji!

I guess it'd be useful for any files which are compressible, right? And also 
the target HDFS can have less free space. Are you thinking there may be 
downsides?


> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch, 
> HADOOP-13114.06.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2017-01-09 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813173#comment-15813173
 ] 

Hadoop QA commented on HADOOP-13114:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
2 new + 178 unchanged - 0 fixed = 180 total (was 178) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 41s{color} 
| {color:red} hadoop-distcp in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestDistCpCompression |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13114 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846433/HADOOP-13114.06.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b2f024a91ddf 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 91bf504 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/whitespace-tabs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11403/testReport/ |
| modules |

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-11-21 Thread Koji Noguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684893#comment-15684893
 ] 

Koji Noguchi commented on HADOOP-13114:
---

Sorry for joining late on this jira but this feature only seems to make sense 
for compressing text files.
Isn't the use case too narrow to be part of the general distcp tool ?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-11-21 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684765#comment-15684765
 ] 

Yongjun Zhang commented on HADOOP-13114:


HI [~snayakm] and [~raviprak], thanks a lot for your earlier work here!

HI Ravi, I did a review of latest rev 5 you posted, some comments here:

1. All items listed in
https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15668944=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15668944
* use constants instead of hardcoded ones.
* use DistCp's own set of configuration instead of the FileOutputFormat ones. 
This would separate distcp from other mapreduce job's config.
* let DistCp fail before getting to mapper, if the compression is enabled with 
invalid codec
* added a negative test

which I did in the latest patch version in HADOOP-8065.

2. Think about using extended attributes to address
https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15670862=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15670862

3. Nits: misnomer in {{private boolean outputCodec = false;}}, which meant to 
be {{compressOutput}}

I think 2 can be deferred to later in a separate jira.

What do you think?

Thanks.





> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-11-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675430#comment-15675430
 ] 

Hadoop QA commented on HADOOP-13114:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-distcp: The patch generated 
1 new + 160 unchanged - 0 fixed = 161 total (was 160) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
8s{color} | {color:green} hadoop-distcp in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HADOOP-13114 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12839476/HADOOP-13114.05.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c1e372d245b1 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f05a9ce |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/testReport/ |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11094/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL:

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-11-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667460#comment-15667460
 ] 

Hadoop QA commented on HADOOP-13114:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HADOOP-13114 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-13114 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12803827/HADOOP-13114-trunk_2016-05-12-1.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11068/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-06-15 Thread Suraj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333079#comment-15333079
 ] 

Suraj Nayak commented on HADOOP-13114:
--

[~raviprak] : Any improvements/suggestions/review on this patch ?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-13 Thread Suraj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283079#comment-15283079
 ] 

Suraj Nayak commented on HADOOP-13114:
--

JIRA was not accepting comments when I uploaded the latest patch with 
{{CodecPool}} changes. Adding the details of Jenkins build here with this 
comment :
Jenkins Console output Link : 
[https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/console]
Jenkins output : 



+1 overall

| Vote |  Subsystem |  Runtime   | Comment

|   0  |reexec  |  0m 13s| Docker mode activated. 
|  +1  |   @author  |  0m 0s | The patch does not contain any @author 
|  ||| tags.
|  +1  |test4tests  |  0m 0s | The patch appears to include 2 new or 
|  ||| modified test files.
|  +1  |mvninstall  |  7m 1s | trunk passed 
|  +1  |   compile  |  0m 14s| trunk passed with JDK v1.8.0_91 
|  +1  |   compile  |  0m 17s| trunk passed with JDK v1.7.0_95 
|  +1  |checkstyle  |  0m 17s| trunk passed 
|  +1  |   mvnsite  |  0m 22s| trunk passed 
|  +1  |mvneclipse  |  0m 15s| trunk passed 
|  +1  |  findbugs  |  0m 28s| trunk passed 
|  +1  |   javadoc  |  0m 12s| trunk passed with JDK v1.8.0_91 
|  +1  |   javadoc  |  0m 15s| trunk passed with JDK v1.7.0_95 
|  +1  |mvninstall  |  0m 17s| the patch passed 
|  +1  |   compile  |  0m 13s| the patch passed with JDK v1.8.0_91 
|  +1  | javac  |  0m 13s| the patch passed 
|  +1  |   compile  |  0m 15s| the patch passed with JDK v1.7.0_95 
|  +1  | javac  |  0m 15s| the patch passed 
|  +1  |checkstyle  |  0m 14s| the patch passed 
|  +1  |   mvnsite  |  0m 20s| the patch passed 
|  +1  |mvneclipse  |  0m 11s| the patch passed 
|  +1  |whitespace  |  0m 0s | The patch has no whitespace issues. 
|  +1  |  findbugs  |  0m 36s| the patch passed 
|  +1  |   javadoc  |  0m 10s| the patch passed with JDK v1.8.0_91 
|  +1  |   javadoc  |  0m 12s| the patch passed with JDK v1.7.0_95 
|  +1  |  unit  |  8m 40s| hadoop-distcp in the patch passed with 
|  ||| JDK v1.8.0_91.
|  +1  |  unit  |  7m 55s| hadoop-distcp in the patch passed with 
|  ||| JDK v1.7.0_95.
|  +1  |asflicense  |  0m 17s| The patch does not generate ASF License 
|  ||| warnings.
|  ||  29m 51s   | 


|| Subsystem || Report/Notes ||

| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12803827/HADOOP-13114-trunk_2016-05-12-1.patch
 |
| JIRA Issue | HADOOP-13114 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 62e2be2ea3c4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / fa440a3 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_91 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| findbugs | v3.0.0 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/testReport/ |
| modules | C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/9414/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Fix For: 3.0.0-alpha1
>
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch, HADOOP-13114-trunk_2016-05-10-1.patch, 
> HADOOP-13114-trunk_2016-05-12-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278033#comment-15278033
 ] 

Hadoop QA commented on HADOOP-13114:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
37s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
46s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 50s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 6s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 35m 14s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12803202/HADOOP-13114-trunk_2016-05-10-1.patch
 |
| JIRA Issue | HADOOP-13114 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c486469f6985 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 87f5e35 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-09 Thread Suraj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277629#comment-15277629
 ] 

Suraj Nayak commented on HADOOP-13114:
--

With the uploaded patch 
[HADOOP-13114-trunk_2016-05-08-1.patch|https://issues.apache.org/jira/secure/attachment/12802907/HADOOP-13114-trunk_2016-05-08-1.patch]
 there is a issue with directory naming. The change was intended to change the 
file name(append the codec file extensioin), but the patch is changing the 
directory name itself instead of file names. Working on patch to fix it.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Fix For: 3.0.0
>
> Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, 
> HADOOP-13114-trunk_2016-05-08-1.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-08 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275837#comment-15275837
 ] 

Hadoop QA commented on HADOOP-13114:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 
new + 180 unchanged - 1 fixed = 180 total (was 181) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 22s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.8.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 37s 
{color} | {color:green} hadoop-distcp in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 29m 9s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12802907/HADOOP-13114-trunk_2016-05-08-1.patch
 |
| JIRA Issue | HADOOP-13114 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux a4280501436c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275464#comment-15275464
 ] 

Hadoop QA commented on HADOOP-13114:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} hadoop-tools/hadoop-distcp: The patch generated 0 
new + 180 unchanged - 1 fixed = 180 total (was 181) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 10s 
{color} | {color:red} hadoop-tools_hadoop-distcp-jdk1.8.0_91 with JDK v1.8.0_91 
generated 1 new + 50 unchanged - 0 fixed = 51 total (was 50) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 15s {color} 
| {color:red} hadoop-distcp in the patch failed with JDK v1.8.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 47s {color} 
| {color:red} hadoop-distcp in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 54s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed junit tests | hadoop.tools.TestOptionsParser |
| JDK v1.7.0_95 Failed junit tests | hadoop.tools.TestOptionsParser |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:cf2ee45 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12802853/HADOOP-13114-trunk_2016-05-07-1.patch
 |
| JIRA Issue | HADOOP-13114 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-07 Thread Suraj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275375#comment-15275375
 ] 

Suraj Nayak commented on HADOOP-13114:
--

[~raviprak] : Regarding your 
[comment|https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15269857=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15269857]
 on reusing codec instead of creating new for each file, here are my thoughts 
and questions:
* {{org.apache.hadoop.io.compress.CompressionCodec.Util}} has a static Util 
class which consists of {{createOutputStreamWithCodecPool}} method. Do you 
think its good idea to change the class and method to public ? 
* I thought of copying the {{createOutputStreamWithCodecPool}} method code into 
{{DistCpUtils}}, but that will result in code duplication. What would you 
suggest for making this code reusable?

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
>  Labels: distcp
> Fix For: 3.0.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

2016-05-07 Thread Suraj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275171#comment-15275171
 ] 

Suraj Nayak commented on HADOOP-13114:
--

This JIRA is similar to HADOOP-8065. HADOOP-8065 aims to compress data during 
transit which is a huge effort. This JIRA is simplified to enable to user to 
compress data when the data lands on target filesystem.

> DistCp should have option to compress data on write
> ---
>
> Key: HADOOP-13114
> URL: https://issues.apache.org/jira/browse/HADOOP-13114
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Suraj Nayak
>Assignee: Suraj Nayak
>Priority: Minor
> Fix For: 3.0.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.
> This JIRA is similar to 
> [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065]. 
> [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065] aims to 
> compress data *during transit* which is a huge effort. This JIRA is 
> simplified to enable to user to compress data when the data lands on target 
> filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

[jira] [Commented] (HADOOP-13114) DistCp should have option to compress data on write

22 matches

Site Navigation

Mail list logo

Footer information