Re: Writing compressed sequence files

Josh Wills Fri, 02 Aug 2013 16:57:56 -0700

Hey Som,

The Pipeline object that coordinates the flow has a getConfiguration()
method where you can set any options you might like and they will propagate
to all of your jars.

I usually implement Hadoop's Tool interface and then specify these
configuration options on the command line so I can play with them
independent of the logic of my runtime, and I end up w/something like:

hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D
mapred.output.compression.type=block etc.

I think that having some syntactic sugar for compressing Target objects
(like To.sequenceFile or To.avroFile) would be a nice JIRA.

J

On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <[email protected]> wrote:

> Hi all,
>
> I am trying to write compressed sequence files at the end of my crunch
> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path))
> for that.
> However, Crunch is writing an uncompressed sequence file by default. How
> do I pass the codec that I want to use to Crunch?
>
> Looking forward for your inputs.
>
> Thanks,
> Som
>
>

-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Writing compressed sequence files

Reply via email to