Does anyone know of any existing StoreFunc to specify a maximum output file size? Or would I need to write a custom StoreFunc to do this?
I am running into a problem on Amazon's EMR where the files the reducers are writing are too large to be uploaded to S3 (5GB limit per file) and I need to figure out a way to get the output file sizes down into a reasonable range. The other way would be to fire up more machines, which would provide more reducers, meaning the data is split into more files, yielding smaller files. But I want the resulting files to be split on some reasonable file size (50 - 100MB) so they are friendly for pulling down, inspecting, and testing with. Any ideas? -Zach
