Greetings,
I've tried to run a job with the following command:
hadoop jar ./hadoop-streaming-0.20.2-cdh3u0.jar \
-input /in/avro \
-output $out \
-mapper avro-mapper.py \
-reducer avro-reducer.py \
-file avro-mapper.py \
-file avro-reducer.py \
-cacheArchive /cache/avro-mapred-1.6.0-SNAPSHOT.jar \
-inputformat AvroAsTextInputFormat
However I get
-inputformat : class not found : AvroAsTextInputFormat
I'm probably missing something obvious to do.
Any ideas?
Thanks!
--
Miki
On Fri, Jun 3, 2011 at 1:43 AM, Doug Cutting <[email protected]> wrote:
> Miki,
>
> Have you looked at AvroAsTextInputFormat?
>
> http://avro.apache.org/docs/current/api/java/org/apache/avro/mapred/AvroAsTextInputFormat.html
>
> Also, release 1.5.2 will include AvroTextOutputFormat:
>
> https://issues.apache.org/jira/browse/AVRO-830
>
> Are these perhaps what you're looking for?
>
> Doug
>
> On 06/02/2011 11:30 PM, Miki Tebeka wrote:
>> Greetings,
>>
>> I'd like to use hadoop streaming with Avro files.
>> My plan is to write an inputformat class that emits json records, one
>> per line. This way the streaming application can read one record per
>> line.
>> (http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plugins+for+Jobs)
>>
>> I couldn't find any documentation/help about writing inputformat
>> classes. Can someone point me to the right direction?
>>
>> Thanks,
>> --
>> Miki
>