Hi Jason, A few questions (in order):
1. Does Hadoop's own NLineInputFormat not suffice? http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html 2. Do you make sure to pass your jar into the front-end too? $ export HADOOP_CLASSPATH=/path/to/your/jar $ command… 3. Does jar -tf <yourjar> carry a proper mypackage.NLineRecordReader? 4. Is your class marked public? On Thu, Oct 18, 2012 at 9:32 AM, Jason Wang <[email protected]> wrote: > Hi all, > I'm experimenting with hadoop streaming on build 1.0.3. > > To give background info, i'm streaming a text file into mapper written in C. > Using the default settings, streaming uses TextInputFormat which creates one > record from each line. The problem I am having is that I need record > boundaries to be every 4 lines. When the splitter breaks up the input into > the mappers, I have partial records on the boundaries due to this. To > address this, my approach was to write a new RecordReader class almost in > java that is almost identical to LineRecordReader, but with a modified > next() method that reads 4 lines instead of one. > > I then compiled the new class and created a jar. I wanted to import this at > run time using the -libjars argument, like such: > > hadoop jar ../contrib/streaming/hadoop-streaming-1.0.3.jar -libjars > NLineRecordReader.jar -files test_stream.sh -inputreader > mypackage.NLineRecordReader -input /Users/hadoop/test/test.txt -output > /Users/hadoop/test/output -mapper “test_stream.sh” -reducer NONE > > Unfortunately, I keep getting the following error: > -inputreader: class not found: mypackage.NLineRecordReader > > My question is 2 fold. Am I using the right approach to handle the 4 line > records with the custom RecordReader implementation? And why isn't -libjars > working to include my class to hadoop streaming at runtime? > > Thanks, > Jason -- Harsh J
