Re: Avro new mapreduce API

Xiaming Chen Sat, 05 Oct 2013 22:54:01 -0700

Hi Johannes,

Thanks for you remind. It's solved after adding mapper key/value schema 
settings.


New mapreduce API is more convenient than mapred's. I love this way.


Best Regards,

Jamin


在 2013-10-6，上午5:30，Johannes Schulte <[email protected]> 写道：

> Hi,
> 
> you should try using the static methods of AvroJob  to configure your map 
> output key and value schemas. This takes care of configuring the right 
> KeyComparators for you. SO instead of writing
> 
> job.setMapOutputKeyClass(AvroKey.class);
>         job.setMapOutputValueClass(AvroValue.class);
> 
> write
> 
> AvroJob.setMapOutputKEySchema(Schema.create(Type.String));
> AvroJob.setMapOutputValueSchema(NetflowRecord.getClassSchema())
> 
> and same for output values
> 
> AvroJob.setOutputKeySchema.
> 
> 
> Cheers,
> Johannes
> 
> 
> 
> On Wed, Oct 2, 2013 at 3:28 PM, Xiaming Chen <[email protected]> wrote:
> Hi there,
> 
> Can u give me some examples or explaination about programming with 
> pure org.apache.avro.mapreduce interfaces??
> 
> -------- Save your time, continue if you know how --------
> 
> All of my programs are writing with hadoop's new MR1 interfaces 
> (org.apache.hadoop.mapreduce), so I want to use new 
> org.apache.avro.mapreduce of avro too. But it doesn't work for me.
> 
> The program takes input of avro data and output the same.
> The main idea behind my program is subclassing hadoop's Mapper 
> and Reducer against avro wrapped key/value.
> 
> Here is a block of my job driver :
> 
>         AvroJob.setInputKeySchema(job, NetflowRecord.getClassSchema());
>         AvroJob.setOutputKeySchema(job, NetflowRecord.getClassSchema());
> 
>         job.setMapperClass(MyAvroMap.class);
>         job.setReducerClass(MyAvroReduce.class);
>         
>         job.setInputFormatClass(AvroKeyInputFormat.class);
>         job.setOutputFormatClass(AvroKeyOutputFormat.class);
> 
>         job.setMapOutputKeyClass(AvroKey.class);
>         job.setMapOutputValueClass(AvroValue.class);
>         
>         job.setOutputKeyClass(AvroKey.class);
>         job.setOutputValueClass(NullWritable.class);
> 
> The definitions of MyAvroMap and MyAvroReduce subclasses respectivly are
> 
>     public static class MyAvroMap extends Mapper<AvroKey<NetflowRecord>, 
> NullWritable,
>                       AvroKey<CharSequence>, AvroValue<NetflowRecord>>{ ... }
> 
>     public static class MyAvroReduce extends Reducer<AvroKey<CharSequence>, 
> AvroValue<NetflowRecord>, 
>                       AvroKey<NetflowRecord>, NullWritable>{ ... }
> 
> The methioned NetflowRecord is my avro record class. And I got running 
> exception
> 
>     java.lang.ClassCastException: class org.apache.avro.hadoop.io.AvroKey
> 
> By reading hadoop's and avro's source code,
> I found that the exception was thrown by JobConf to make sure
> the map key is a subclass of WritableComparable, like this (hadoop1.2.1, 
> line759)
> 
>     
> WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));
> 
> But the avro shows that AvroKey and AvroValue are just a simple wrapper 
> **without** subclassing Writable* interfaces of hadoop.
> I believe that, even without testing, I can get through that using old mapred 
> interfaces, 
> but its not what I want.
> 
> 
> Sincerely,
> 
> Jamin
> 
> 
>

Re: Avro new mapreduce API

Reply via email to