you can explicitly set the split size
On Wednesday, May 13, 2015 11:37 PM, Pradeep Gollakota
<[email protected]> wrote:
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the job is
generating too many splits. I don't have this problem when running queries in
Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of mappers?
Thanks,Pradeep
