Still didn't work. I'm pretty new to hadoop world, I probably need to place the avro jar somewhere on the classpath of the nodes, however I have no idea how to do that.
On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[email protected]> wrote: > Miki, > > You'll need to provide the entire canonical class name > (org.apache.avro.mapred…). > > On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[email protected]> wrote: >> Greetings, >> >> I've tried to run a job with the following command: >> >> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \ >> -input /in/avro \ >> -output $out \ >> -mapper avro-mapper.py \ >> -reducer avro-reducer.py \ >> -file avro-mapper.py \ >> -file avro-reducer.py \ >> -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \ >> -inputformat AvroAsTextInputFormat >> >> However I get >> -inputformat : class not found : AvroAsTextInputFormat >> >> I'm probably missing something obvious to do. >> >> Any ideas? >> >> Thanks! >> -- >> Miki >> >> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[email protected]> wrote: >>> Miki, >>> >>> Have you looked at AvroAsTextInputFormat? >>> >>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html >>> >>> Also, release 1.5.2 will include AvroTextOutputFormat: >>> >>> https://issues.apache.org/jira/browse/AVRO-830 >>> >>> Are these perhaps what you're looking for? >>> >>> Doug >>> >>> On 06/02/2011 11:30 PM, Miki Tebeka wrote: >>>> Greetings, >>>> >>>> I'd like to use hadoop streaming with Avro files. >>>> My plan is to write an inputformat class that emits json records, one >>>> per line. This way the streaming application can read one record per >>>> line. >>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs) >>>> >>>> I couldn't find any documentation/help about writing inputformat >>>> classes. Can someone point me to the right direction? >>>> >>>> Thanks, >>>> -- >>>> Miki >>> >> > > > > -- > Harsh J >
