Hi Doug,
I seem to hit a case not covered by the mapred package documentation:
I'd like to read from a TextInputFormat and produce AVRO data in a
map-only job. How Do I do that?
In short, the way to do this is to:
- use a
org.apache.hadoop.mapred.Mapper<K,V,AvroWrapper<O>,NullWritable>
- call AvroJob.setOutputSchema(job,schema) with O's schema
Does that make sense? If that works for you, I can add it to the
javadoc.
Yes, it worked. Incidently, it also reduced my file size to 33% of my
previous custom-avro-writable-in-sequence-file approach.
Thanks,
Markus