Re: Making crunch job output single file

Josh Wills Wed, 30 Oct 2013 08:20:13 -0700

Hey Som,

Check out org.apache.crunch.lib.Shard, it does what you want.


J


On Wed, Oct 30, 2013 at 8:05 AM, Som Satpathy <[email protected]> wrote:

> Hi all,
>
> I have a crunch job that should process a big sequence file and produce a
> single csv file. I am using the "pipeline.writeTextFile(transformedRecords,
> csvFilePath)" to write to a csv. (csvFilePath is like
> "/data/csv_directory"). The larger the input sequence file is, more number
> of mappers are being created and thus equivalent number of csv output files
> are being created.
>
> In classic mapreduce one could output a single file by setting the
> #reducers to 1 while configuring the job. How could I achieve this with
> crunch?
>
> I would really appreciate any help here.
>
> Thanks,
> Som
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Making crunch job output single file

Reply via email to