I *think* streaming support was added only in 1.6
On Mon, Jul 11, 2011 at 5:36 PM, Mona Gandhi <[email protected]> wrote: > I tried using the command that Miki posted, with the difference being the > version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work. > > /home/hadoop/hadoop/bin/hadoop jar > /home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files > avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars > avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer > test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file > test-mapper.py -file test-reducer.py -inputformat > org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output > AvroOutput > > > Error: -inputformat : class not found : > org.apache.avro.mapred.AvroAsTextInputFormat > Streaming Job Failed! > > > Thanks for all the help! > > On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote: > >> Found the magic (-files and -libs): >> >> jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar >> >> hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \ >> -files $jars \ >> -libjars $jars \ >> -input /in/avro \ >> -output /out/avro \ >> -mapper avro-mapper.py \ >> -reducer avro-reducer.py \ >> -file avro-mapper.py \ >> -file avro-reducer.py \ >> -inputformat org.apache.avro.mapred.AvroAsTextInputFormat >> >> Thanks for all the help! >> >> On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[email protected]> wrote: >>> Hadoop has an old version of Avro in it. You must place the 1.6.0 jar >>> (and relevant dependencies, or the avro-tools.jar with all dependencies >>> bundled) in a location that gets picked up first in the task classpath. >>> >>> Packaging it in the job jar works. I'm not sure if putting it in the >>> distributed cache and loading it as a library that way would. >>> >>> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke" >>> <[email protected]> wrote: >>> >>>> You have to package it in the job jar file under a /lib directory. >>>> >>>> >>>> On 6/15/11 9:26 AM, "Miki Tebeka" <[email protected]> wrote: >>>> >>>>> Still didn't work. >>>>> >>>>> I'm pretty new to hadoop world, I probably need to place the avro jar >>>>> somewhere on the classpath of the nodes, >>>>> however I have no idea how to do that. >>>>> >>>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[email protected]> wrote: >>>>>> Miki, >>>>>> >>>>>> You'll need to provide the entire canonical class name >>>>>> (org.apache.avro.mapredS). >>>>>> >>>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[email protected]> >>>>>> wrote: >>>>>>> Greetings, >>>>>>> >>>>>>> I've tried to run a job with the following command: >>>>>>> >>>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >>>>>>> -input /in/avro \ >>>>>>> -output $out \ >>>>>>> -mapper avro-mapper.py \ >>>>>>> -reducer avro-reducer.py \ >>>>>>> -file avro-mapper.py \ >>>>>>> -file avro-reducer.py \ >>>>>>> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >>>>>>> -inputformat AvroAsTextInputFormat >>>>>>> >>>>>>> However I get >>>>>>> -inputformat : class not found : AvroAsTextInputFormat >>>>>>> >>>>>>> I'm probably missing something obvious to do. >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> Thanks! >>>>>>> -- >>>>>>> Miki >>>>>>> >>>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[email protected]> >>>>>>> wrote: >>>>>>>> Miki, >>>>>>>> >>>>>>>> Have you looked at AvroAsTextInputFormat? >>>>>>>> >>>>>>>> >>>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av >>>>>>>> roAsT >>>>>>>> extInputFormat.html >>>>>>>> >>>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat: >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/AVRO-830 >>>>>>>> >>>>>>>> Are these perhaps what you're looking for? >>>>>>>> >>>>>>>> Doug >>>>>>>> >>>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>>>>>>> Greetings, >>>>>>>>> >>>>>>>>> I'd like to use hadoop streaming with Avro files. >>>>>>>>> My plan is to write an inputformat class that emits json records, >>>>>>>>> one >>>>>>>>> per line. This way the streaming application can read one record per >>>>>>>>> line. >>>>>>>>> >>>>>>>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi >>>>>>>>> ng+Ot >>>>>>>>> her+Plugins+for+Jobs) >>>>>>>>> >>>>>>>>> I couldn't find any documentation/help about writing inputformat >>>>>>>>> classes. Can someone point me to the right direction? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -- >>>>>>>>> Miki >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Harsh J >>>>>> >>>> >>>> >>>> iCrossing Privileged and Confidential Information >>>> This email message is for the sole use of the intended recipient(s) and >>>> may contain confidential and privileged information of iCrossing. Any >>>> unauthorized review, use, disclosure or distribution is prohibited. If >>>> you are not the intended recipient, please contact the sender by reply >>>> email and destroy all copies of the original message. >>>> >>>> >>> >>> > >
