I tried using the command that Miki posted, with the difference being the version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work.
/home/hadoop/hadoop/bin/hadoop jar /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file test-mapper.py -file test-reducer.py -inputformat org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output AvroOutput Error: -inputformat : class not found : org.apache.avro.mapred.AvroAsTextInputFormat Streaming Job Failed! Thanks for all the help! On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote: > Found the magic (-files and -libs): > > jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar > > hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ > -files $jars \ > -libjars $jars \ > -input /in/avro \ > -output /out/avro \ > -mapper avro-mapper.py \ > -reducer avro-reducer.py \ > -file avro-mapper.py \ > -file avro-reducer.py \ > -inputformat org.apache.avro.mapred.AvroAsTextInputFormat > > Thanks for all the help! > > On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[email protected]> wrote: >> Hadoop has an old version of Avro in it. You must place the 1.6.0 jar >> (and relevant dependencies, or the avro-tools.jar with all dependencies >> bundled) in a location that gets picked up first in the task classpath. >> >> Packaging it in the job jar works. I'm not sure if putting it in the >> distributed cache and loading it as a library that way would. >> >> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" >> <[email protected]> wrote: >> >>> You have to package it in the job jar file under a /lib directory. >>> >>> >>> On 6/15/11 9:26 AM, "Miki Tebeka" <[email protected]> wrote: >>> >>>> Still didn't work. >>>> >>>> I'm pretty new to hadoop world, I probably need to place the avro jar >>>> somewhere on the classpath of the nodes, >>>> however I have no idea how to do that. >>>> >>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[email protected]> wrote: >>>>> Miki, >>>>> >>>>> You'll need to provide the entire canonical class name >>>>> (org.apache.avro.mapredS). >>>>> >>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[email protected]> >>>>> wrote: >>>>>> Greetings, >>>>>> >>>>>> I've tried to run a job with the following command: >>>>>> >>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>>> -input /in/avro \ >>>>>> -output $out \ >>>>>> -mapper avro-mapper.py \ >>>>>> -reducer avro-reducer.py \ >>>>>> -file avro-mapper.py \ >>>>>> -file avro-reducer.py \ >>>>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>>> -inputformat AvroAsTextInputFormat >>>>>> >>>>>> However I get >>>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>>> >>>>>> I'm probably missing something obvious to do. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> Miki >>>>>> >>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[email protected]> >>>>>> wrote: >>>>>>> Miki, >>>>>>> >>>>>>> Have you looked at AvroAsTextInputFormat? >>>>>>> >>>>>>> >>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av >>>>>>> roAsT >>>>>>> extInputFormat.html >>>>>>> >>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/AVRO-830 >>>>>>> >>>>>>> Are these perhaps what you're looking for? >>>>>>> >>>>>>> Doug >>>>>>> >>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>>> Greetings, >>>>>>>> >>>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>>> My plan is to write an inputformat class that emits json records, >>>>>>>> one >>>>>>>> per line. This way the streaming application can read one record per >>>>>>>> line. >>>>>>>> >>>>>>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi >>>>>>>> ng+Ot >>>>>>>> her+Plugins+for+Jobs) >>>>>>>> >>>>>>>> I couldn't find any documentation/help about writing inputformat >>>>>>>> classes. Can someone point me to the right direction? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -- >>>>>>>> Miki >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>> >>> >>> iCrossing Privileged and Confidential Information >>> This email message is for the sole use of the intended recipient(s) and >>> may contain confidential and privileged information of iCrossing. Any >>> unauthorized review, use, disclosure or distribution is prohibited. If >>> you are not the intended recipient, please contact the sender by reply >>> email and destroy all copies of the original message. >>> >>> >> >>
