Re: Multiple output from crunch

Josh Wills Mon, 06 Jul 2015 11:53:16 -0700

Hey Nipur,

AvroPathPerKeyTarget is the closest thing to what you want; you can use it
on a PTable<String, T> collection, where T is any type that Avro supports.
It will write multiple output files to a common base directory where the
name of the file depends on the value of the String key in the PTable.


Josh

On Mon, Jul 6, 2015 at 11:47 AM, Nipur Patodi <[email protected]>
wrote:

> Hi All,
>
>
>
> I am very new to crunch.
>
>
> I am trying to read data from csv file using MR pipelines. I need to
> convert and  bucketize this data on the bases of time stamp which is a
> field in csv.  I need to write data per timestamp in to single file.
>
>
>
> This scenario is equivalent to writing values (record) per key (which is
> time stamp) to different file.
>
> I can achieve this using multiple output format in mapreduce.
>
>
>
> Do we have any equivalent concept or design pattern to achieve same
> behavior using crunch?
>
>
>
> Please suggest.
>
>
>
> Thanks,
>
>
>
> _Nipur
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Multiple output from crunch

Reply via email to