Hi Dan, You're stepping off the documented path here, but I think that although it might be a bit of work, it should be possible.
Things to watch out for: you might not be able to use AvroMapper/AvroReducer so easily, and you may have to mess around with the job conf a bit (Avro-configured jobs use their own shuffle config with AvroKeyComparator, which may not be what you want if you're also trying to use writables). I'd suggest simply reading the code in org.apache.avro.mapred[uce] -- it's not too complicated. Whether Avro files or writables (i.e. Hadoop sequence files) are better for you depends mostly on which format you'd rather have your data in. If you want to read the data files with something other than Hadoop, Avro is definitely a good option. Also, Avro data files are self-describing (due to their embedded schema) which makes them pleasant to use with tools like Pig and Hive. Martin On 3 July 2013 10:12, Dan Filimon <[email protected]> wrote: > Hi! > > I'm working on integrating Avro into our data processing pipeline. > We're using quite a few standard Hadoop and Mahout writables (IntWritable, > VectorWritable). > > I'm first going to replace the custom Writables with Avro, but in terms of > the other ones, how important would you say it is to use AvroKey<Integer> > instead of IntWritable for example? > > The changes will happen gradually but are they even worth it? > > Thanks! >
