Hrm-- maybe something like the AvroPathPerKeyTarget, and a DoFn that divides the data up into enough keys so that the data associated with a given key is always < 10MB?
On Mon, Jan 26, 2015 at 1:15 PM, David Ortiz <[email protected]> wrote: > Hello, > > Is there any way to control output sizing on the crunch pipeline's > write method? I am processing data which is written to s3 for a program > which cannot handle more than 10-20 MB per file, and am at a loss for how > to do this without writing a hive script to process the data. > > Thanks, > David Ortiz > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
