Hi,

I have a use case, wherein I need to write a Mapper Only job reads the file
from disk, and writes to HDFS in Avro serialized format. (I want to do this
because  I want the Mapper instances to actually download data from
somewhere onto local FS, and load that data in HDFS).

Issue:
1. The job won't have any HDFS Inputpath or OutputPath.
2. I want to be able to set the number of Mappers depending on my internet
bandwidth. So the number of mappers shouldn't be calculated based on
inputsplits..

Any suggestions on how to do this? I would really appreciate any example
code.

Deepak

Reply via email to