If you process 4 files with schemas A, B, C, and D as the writer schemas, then
I would assume that you would want to specify the reader schema using the
setInput*Schema methods. Then you can set the writer schema with the methods
that you are calling. To be clear all data processed by the job should have one
reader schema that is determined when the data is read, and there should also
be one writer schema (possibly different from the reader schema) when the data
is written back to files. If you need to process the data from each schema
independently, you should probably create one job for each schema.
Disclaimer: I have never used the AvroJob interface directly; so this is just
me inferring what I think it should do based on my experience with AvroStorage
and the other language specific Avro interfaces.
Hope this helps,
Sam
On Thursday, June 25, 2015 12:53 PM, Nishanth S <[email protected]>
wrote:
Hello All,
We are using avro 1.7.7 and hadoop 2.5.1 in our project.We need to process a
mixed mode binary file using map reduce and have the output as multiple avro
files and each of these avro files would have different avro schemas.I looked
at AvroMultipleOutputs class but did not completely understand on what needs
to be done in the driver class.This is a map only job the output of which
should be 4 different avro files(which has different avro schemas) into
different hdfs directories.
Do we need to set all key and value avro schemas to Avrojob in driver class?
AvroJob.setOutputKeySchema(job,
Schema.create(Schema.Type.NULL));AvroJob.setOutputValueSchema(job,
A.getClassSchema());
Now if I have schemas B,C and D how would these be set to AvroJob?.Thanks
for your help.
Thanks,Nishan