Hello, I have been trying to run Hadoop on a set of small text files, not larger than 10k each. The total input size is 15MB. If I try to run the example word count application, it takes about 2000 seconds, more than half an hour to complete. However, if I merge all the files into one large file, it takes much less than a minute. I think using MultiInputFileFormat can be helpful at this point. However, the API documentation is not really helpful. I wonder if MultiInputFileFormat can really solve my problem, and if so, can you suggest me a reference on how to use it, or a few lines to be added to the word count example to make things more clear?
Thanks in advance. Regards, Jason Curtes