Hey Som, Check out org.apache.crunch.lib.Shard, it does what you want.
J On Wed, Oct 30, 2013 at 8:05 AM, Som Satpathy <[email protected]> wrote: > Hi all, > > I have a crunch job that should process a big sequence file and produce a > single csv file. I am using the "pipeline.writeTextFile(transformedRecords, > csvFilePath)" to write to a csv. (csvFilePath is like > "/data/csv_directory"). The larger the input sequence file is, more number > of mappers are being created and thus equivalent number of csv output files > are being created. > > In classic mapreduce one could output a single file by setting the > #reducers to 1 while configuring the job. How could I achieve this with > crunch? > > I would really appreciate any help here. > > Thanks, > Som > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
