Re: Output Sizing

Josh Wills Mon, 26 Jan 2015 13:26:31 -0800

Hrm-- maybe something like the AvroPathPerKeyTarget, and a DoFn that
divides the data up into enough keys so that the data associated with a
given key is always < 10MB?


On Mon, Jan 26, 2015 at 1:15 PM, David Ortiz <[email protected]> wrote:

> Hello,
>
>      Is there any way to control output sizing on the crunch pipeline's
> write method?  I am processing data which is written to s3 for a program
> which cannot handle more than 10-20 MB per file, and am at a loss for how
> to do this without writing a hive script to process the data.
>
> Thanks,
>      David Ortiz
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Re: Output Sizing

Reply via email to