Thanks Josh. I tried setting compression parameters via the Configuration object and also via command line, but the output sequence file never seems to get compressed. I'm trying to Snappy compress it.
If I trying creating a sequence file outside of crunch using SequenceFile.createWriter, I see the file getting compressed with my compression type (i.e Snappy) I was wondering if this is a know issue with crunch.. Thanks, Som On Fri, Aug 2, 2013 at 4:56 PM, Josh Wills <[email protected]> wrote: > Hey Som, > > The Pipeline object that coordinates the flow has a getConfiguration() > method where you can set any options you might like and they will propagate > to all of your jars. > > I usually implement Hadoop's Tool interface and then specify these > configuration options on the command line so I can play with them > independent of the logic of my runtime, and I end up w/something like: > > hadoop jar <crunch-job.jar> -D mapred.compress.output=true -D > mapred.output.compression.type=block etc. > > I think that having some syntactic sugar for compressing Target objects > (like To.sequenceFile or To.avroFile) would be a nice JIRA. > > J > > > On Fri, Aug 2, 2013 at 3:58 PM, Som Satpathy <[email protected]>wrote: > >> Hi all, >> >> I am trying to write compressed sequence files at the end of my crunch >> pipeline. I'm doing a pipeline.write(mycollection, To.sequenceFile(path)) >> for that. >> However, Crunch is writing an uncompressed sequence file by default. How >> do I pass the codec that I want to use to Crunch? >> >> Looking forward for your inputs. >> >> Thanks, >> Som >> >> > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> >
