Hello I would like to see the possibility of using map reduce framework for my following problem.
I have a set of huge files. I would like to execute a binary over every input files. The binary needs to operate over the whole file and hence it is not possible to split the file in chunks. Let’s assume that I have six such files and have their names in a single text file. I need to write hadoop code to take this single file as input and every line in it should go to one map task. The map tasks shall execute the binary on this file and the file can be located in hdfs. No reduce tasks is needed and no output shall be emitted from the map tasks as well. The binary take care of creating output file in the specified location. Is there a way to tell hadoop to feed single line to a map task? I came across few examples wherein a set of files has been given and looks like the framework try to split the file, reads every line in the split, generates key/value pairs and send this pairs to single map task. In my situation, I want only one key value pair should be generated for one line and it should be given to a single map task. Thats it? For ex. Assume that this is my file <input.txt> myFirstInput.vlc mySecondInput.vlc myThirdInput.vlc Now, first map task should get a pair <1, myFirstInput.vlc>, the second gets a pair <2, mySecondInput.vlc> and so on. Can someone throw some light in to this problem? For me, it looks straightforward but could not find any pointers in the web. With thanks and regards Balson
