Hi there,

Can u give me some examples or explaination about programming with 
pure org.apache.avro.mapreduce interfaces??

-------- Save your time, continue if you know how --------

All of my programs are writing with hadoop's new MR1 interfaces 
(org.apache.hadoop.mapreduce), so I want to use new 
org.apache.avro.mapreduce of avro too. But it doesn't work for me.

The program takes input of avro data and output the same.
The main idea behind my program is subclassing hadoop's Mapper 
and Reducer against avro wrapped key/value.

Here is a block of my job driver :

        AvroJob.setInputKeySchema(job, NetflowRecord.getClassSchema());
        AvroJob.setOutputKeySchema(job, NetflowRecord.getClassSchema());

        job.setMapperClass(MyAvroMap.class);
        job.setReducerClass(MyAvroReduce.class);
        
        job.setInputFormatClass(AvroKeyInputFormat.class);
        job.setOutputFormatClass(AvroKeyOutputFormat.class);

        job.setMapOutputKeyClass(AvroKey.class);
        job.setMapOutputValueClass(AvroValue.class);
        
        job.setOutputKeyClass(AvroKey.class);
        job.setOutputValueClass(NullWritable.class);

The definitions of MyAvroMap and MyAvroReduce subclasses respectivly are

    public static class MyAvroMap extends Mapper<AvroKey<NetflowRecord>, 
NullWritable,
                        AvroKey<CharSequence>, AvroValue<NetflowRecord>>{ ... }

    public static class MyAvroReduce extends Reducer<AvroKey<CharSequence>, 
AvroValue<NetflowRecord>, 
                        AvroKey<NetflowRecord>, NullWritable>{ ... }

The methioned NetflowRecord is my avro record class. And I got running exception

    java.lang.ClassCastException: class org.apache.avro.hadoop.io.AvroKey

By reading hadoop's and avro's source code,
I found that the exception was thrown by JobConf to make sure
the map key is a subclass of WritableComparable, like this (hadoop1.2.1, 
line759)

    
WritableComparator.get(getMapOutputKeyClass().asSubclass(WritableComparable.class));

But the avro shows that AvroKey and AvroValue are just a simple wrapper 
**without** subclassing Writable* interfaces of hadoop.
I believe that, even without testing, I can get through that using old mapred 
interfaces, 
but its not what I want.


Sincerely,

Jamin


Reply via email to