Hi Balson, Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>? You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <[email protected]>wrote: > > Hello > > I would like to see the possibility of using map reduce framework for my > following problem. > > I have a set of huge files. I would like to execute a binary over every > input files. The binary needs to operate over the whole file and hence it > is not possible to split the file in chunks. Let’s assume that I have six > such files and have their names in a single text file. I need to write > hadoop code to take this single file as input and every line in it should > go to one map task. The map tasks shall execute the binary on this file and > the file can be located in hdfs. No reduce tasks is needed and no output > shall be emitted from the map tasks as well. The binary take care of > creating output file in the specified location. > Is there a way to tell hadoop to feed single line to a map task? I came > across few examples wherein a set of files has been given and looks like > the framework try to split the file, reads every line in the split, > generates key/value pairs and send this pairs to single map task. In my > situation, I want only one key value pair should be generated for one line > and it should be given to a single map task. Thats it? > > For ex. Assume that this is my file <input.txt> > > myFirstInput.vlc > mySecondInput.vlc > myThirdInput.vlc > > Now, first map task should get a pair <1, myFirstInput.vlc>, the second > gets a pair <2, mySecondInput.vlc> and so on. > > Can someone throw some light in to this problem? For me, it looks > straightforward but could not find any pointers in the web. > > > > > > > > With thanks and regards > Balson > > > > -- Regards, Ted Xu
