Hi, I have a use case, wherein I need to write a Mapper Only job reads the file from disk, and writes to HDFS in Avro serialized format. (I want to do this because I want the Mapper instances to actually download data from somewhere onto local FS, and load that data in HDFS).
Issue: 1. The job won't have any HDFS Inputpath or OutputPath. 2. I want to be able to set the number of Mappers depending on my internet bandwidth. So the number of mappers shouldn't be calculated based on inputsplits.. Any suggestions on how to do this? I would really appreciate any example code. Deepak
