Does anyone know of any existing StoreFunc to specify a maximum output file 
size? Or would I need to write a custom StoreFunc to do this?


I am running into a problem on Amazon's EMR where the files the reducers are 
writing are too large to be uploaded to S3 (5GB limit per file) and I need to 
figure out a way to get the output file sizes down into a reasonable range.


The other way would be to fire up more machines, which would provide more 
reducers, meaning the data is split into more files, yielding smaller files. 
But I want the resulting files to be split on some reasonable file size (50 - 
100MB) so they are friendly for pulling down, inspecting, and testing with.


Any ideas?
-Zach


Reply via email to