Zach,

I work on the Elastic MapReduce team. We are planning to launch
support for multipart upload into Amazon S3 in early January. This
will enable you to write files into Amazon S3 from your reducer that
are up to 5 TB in size.

In the mean time, Dmitriy's advise should work. Increase the number of
reducers and each reducer will process and write less data. This will
work unless you have a very uneven data distribution.

Regards,
Andrew

On Tue, Dec 21, 2010 at 2:52 PM, Zach Bailey <[email protected]> wrote:
>  Does anyone know of any existing StoreFunc to specify a maximum output file 
> size? Or would I need to write a custom StoreFunc to do this?
>
>
> I am running into a problem on Amazon's EMR where the files the reducers are 
> writing are too large to be uploaded to S3 (5GB limit per file) and I need to 
> figure out a way to get the output file sizes down into a reasonable range.
>
>
> The other way would be to fire up more machines, which would provide more 
> reducers, meaning the data is split into more files, yielding smaller files. 
> But I want the resulting files to be split on some reasonable file size (50 - 
> 100MB) so they are friendly for pulling down, inspecting, and testing with.
>
>
> Any ideas?
> -Zach
>
>
>

Reply via email to