I tried using the command that Miki posted, with the difference being the 
version of Avro (1.5.1 instead of 1.6.0). I cant seem to get it to work.

/home/hadoop/hadoop/bin/hadoop jar 
/home/hadoop/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -files 
avro-1.5.1.jar,avro-mapred-1.5.1.jar -libjars 
avro-1.5.1.jar,avro-mapred-1.5.1.jar -mapper test-mapper.py -reducer 
test-reducer.py -jobconf mapred.job.name=AvroTestJob --numReduceTasks 3 -file 
test-mapper.py -file test-reducer.py  -inputformat 
org.apache.avro.mapred.AvroAsTextInputFormat -input avroevents -output 
AvroOutput


Error: -inputformat : class not found : 
org.apache.avro.mapred.AvroAsTextInputFormat
Streaming Job Failed!


Thanks for all the help!

On Jun 15, 2011, at 10:36 AM, Miki Tebeka wrote:

> Found the magic (-files and -libs):
> 
> jars=avro-1.6.0-SNAPSHOT.jar,avro-mapred-1.6.0-SNAPSHOT.jar
> 
> hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
>    -files $jars \
>    -libjars $jars \
>    -input /in/avro \
>    -output /out/avro \
>    -mapper avro-mapper.py \
>    -reducer avro-reducer.py \
>    -file avro-mapper.py \
>    -file avro-reducer.py \
>    -inputformat org.apache.avro.mapred.AvroAsTextInputFormat
> 
> Thanks for all the help!
> 
> On Wed, Jun 15, 2011 at 9:53 AM, Scott Carey <[email protected]> wrote:
>> Hadoop has an old version of Avro in it.  You must place the 1.6.0 jar
>> (and relevant dependencies, or the avro-tools.jar with all dependencies
>> bundled) in a location that gets picked up first in the task classpath.
>> 
>> Packaging it in the job jar works. I'm not sure if putting it in the
>> distributed cache and loading it as a library that way would.
>> 
>> On 6/15/11 9:30 AM, "Matt Pouttu-Clarke"
>> <[email protected]> wrote:
>> 
>>> You have to package it in the job jar file under a /lib directory.
>>> 
>>> 
>>> On 6/15/11 9:26 AM, "Miki Tebeka" <[email protected]> wrote:
>>> 
>>>> Still didn't work.
>>>> 
>>>> I'm pretty new to hadoop world, I probably need to place the avro jar
>>>> somewhere on the classpath of the nodes,
>>>> however I have no idea how to do that.
>>>> 
>>>> On Wed, Jun 15, 2011 at 3:33 AM, Harsh J <[email protected]> wrote:
>>>>> Miki,
>>>>> 
>>>>> You'll need to provide the entire canonical class name
>>>>> (org.apache.avro.mapredS).
>>>>> 
>>>>> On Wed, Jun 15, 2011 at 5:31 AM, Miki Tebeka <[email protected]>
>>>>> wrote:
>>>>>> Greetings,
>>>>>> 
>>>>>> I've tried to run a job with the following command:
>>>>>> 
>>>>>> hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
>>>>>>    -input /in/avro \
>>>>>>    -output $out \
>>>>>>    -mapper avro-mapper.py \
>>>>>>    -reducer avro-reducer.py \
>>>>>>    -file avro-mapper.py \
>>>>>>    -file avro-reducer.py \
>>>>>>    -cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
>>>>>>    -inputformat AvroAsTextInputFormat
>>>>>> 
>>>>>> However I get
>>>>>> -inputformat : class not found : AvroAsTextInputFormat
>>>>>> 
>>>>>> I'm probably missing something obvious to do.
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> Thanks!
>>>>>> --
>>>>>> Miki
>>>>>> 
>>>>>> On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[email protected]>
>>>>>> wrote:
>>>>>>> Miki,
>>>>>>> 
>>>>>>> Have you looked at AvroAsTextInputFormat?
>>>>>>> 
>>>>>>> 
>>>>>>> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/Av
>>>>>>> roAsT
>>>>>>> extInputFormat.html
>>>>>>> 
>>>>>>> Also, release 1.5.2 will include AvroTextOutputFormat:
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/AVRO-830
>>>>>>> 
>>>>>>> Are these perhaps what you're looking for?
>>>>>>> 
>>>>>>> Doug
>>>>>>> 
>>>>>>> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>>>>>>>> Greetings,
>>>>>>>> 
>>>>>>>> I'd like to use hadoop streaming with Avro files.
>>>>>>>> My plan is to write an inputformat class that emits json records,
>>>>>>>> one
>>>>>>>> per line. This way the streaming application can read one record per
>>>>>>>> line.
>>>>>>>> 
>>>>>>>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifyi
>>>>>>>> ng+Ot
>>>>>>>> her+Plugins+for+Jobs)
>>>>>>>> 
>>>>>>>> I couldn't find any documentation/help about writing inputformat
>>>>>>>> classes. Can someone point me to the right direction?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> --
>>>>>>>> Miki
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Harsh J
>>>>> 
>>> 
>>> 
>>> iCrossing Privileged and Confidential Information
>>> This email message is for the sole use of the intended recipient(s) and
>>> may contain confidential and privileged information of iCrossing. Any
>>> unauthorized review, use, disclosure or distribution is prohibited. If
>>> you are not the intended recipient, please contact the sender by reply
>>> email and destroy all copies of the original message.
>>> 
>>> 
>> 
>> 

Reply via email to