Re: Multiple output from crunch

Josh Wills Mon, 06 Jul 2015 12:27:50 -0700

Not right now, no. Is the intent that the output here will go into Hive
partitions?


On Mon, Jul 6, 2015 at 11:57 AM, Nipur Patodi <[email protected]>
wrote:

> Thanks Much Josh,
>
> Do we have something for avro parquet file also?
>
> Thanks,
>
> _Nipur
>
>
>
> On Tue, Jul 7, 2015 at 12:17 AM, Nipur Patodi <[email protected]>
> wrote:
>
>> Hi All,
>>
>>
>>
>> I am very new to crunch.
>>
>>
>> I am trying to read data from csv file using MR pipelines. I need to
>> convert and  bucketize this data on the bases of time stamp which is a
>> field in csv.  I need to write data per timestamp in to single file.
>>
>>
>>
>> This scenario is equivalent to writing values (record) per key (which is
>> time stamp) to different file.
>>
>> I can achieve this using multiple output format in mapreduce.
>>
>>
>>
>> Do we have any equivalent concept or design pattern to achieve same
>> behavior using crunch?
>>
>>
>>
>> Please suggest.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> _Nipur
>>
>
>


-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Multiple output from crunch

Reply via email to