Hi guys: Im wondering - if I'm running mapreduce jobs on a cluster with large block sizes - can i increase performance with either:
1) A custom FileInputFormat 2) A custom partitioner 3) -DnumReducers Clearly, (3) will be an issue due to the fact that it might overload tasks and network traffic... but maybe (1) or (2) will be a precise way to "use" partitions as a "poor mans" block. Just a thought - not sure if anyone has tried (1) or (2) before in order to simulate blocks and increase locality by utilizing the partition API. -- Jay Vyas http://jayunit100.blogspot.com
