Thanks Eric. Now I have couple of questions on this- 1) So that means we cannot deserialize any attributes data using any other schema? We always need to pass the schema that we have used for writing along with any other schema that I want to use for reading purpose? Is that right? 2) Is there any way, I can deserialize any attributes data using any other schema without passing actual schema that we have to serialize?
In my example if you see, I am already storing schemaId in the avro schema that will map to some actual schema name. So while serializing any attributes data, we will also store the schemaId within that avro binary encoded value, and that schemaId will represent this is the schema we have used to serialize it. Now while deserializing that attributes, firstly we will grab the schemaId (by deserializing it with another schema) and see which schema we have used actually to serialize that attributes and then we will deserialize that attributes again using the actual schema... *Raihan Jamal* On Wed, Sep 25, 2013 at 5:30 PM, Eric Wasserman <[email protected]>wrote: > Short answer. Use this constructor instead: > > /** Construct given writer's and reader's schema. */ > > public GenericDatumReader(Schema writer, Schema reader) { > > Longer answer: > > You have to give the GenericDatumReader the EXACT schema that wrote the > bytes that you are trying to parse ("writer's schema"). > You can *also* give it another schema you'd like to use ("reader's > schema") that can be different. > > > Try changing this line of your code: > > GenericDatumReader<GenericRecord> r1 = new > GenericDatumReader<GenericRecord>(schema1); > > To this: > > GenericDatumReader<GenericRecord> r1 = new > GenericDatumReader<GenericRecord>(schema2, schema1); // writer's schema is > "schema2", reader's schema is "schema1" > > > ------------------------------ > *From:* Raihan Jamal <[email protected]> > *Sent:* Wednesday, September 25, 2013 5:10 PM > *To:* [email protected] > *Subject:* Deserialize the attributes data using another schema give me > wrong results > > I am trying to serialize one of our Attributes Daya using Apache Avro > Schema. Here the attribute name is `e7` and the schema that I am using to > serialize it is `schema2.avsc` which is below. > > { > "namespace": "com.avro.test.AvroExperiment", > "type": "record", > "name": "DEMOGRAPHIC", > "doc": "DEMOGRAPHIC data", > "fields": [ > {"name": "dob", "type": "string"}, > {"name": "gndr", "type": "string"}, > {"name": "occupation", "type": "string"}, > {"name": "mrtlStatus", "type": "string"}, > {"name": "numChldrn", "type": "int"}, > {"name": "estInc", "type": "string"}, > {"name": "schemaId", "type": "int"}, > {"name": "lmd", "type": "long"} > ] > } > > Below is the code that I am using to serialize the attribute `e7` using > above avro `schema2.avsc`. And I am able to serialize it properly and it > works fine... > Schema schema = new > Parser().parse((AvroExperiment.class.getResourceAsStream("/schema2.avsc"))); > GenericRecord record = new GenericData.Record(schema); > record.put("dob", "161913600000"); > record.put("gndr", "f"); > record.put("occupation", "doctor"); > record.put("mrtlStatus", "single"); > record.put("numChldrn", 3); > record.put("estInc", "50000"); > record.put("schemaId", 20001); > record.put("lmd", 1379814280254L); > > GenericDatumWriter<GenericRecord> writer = new > GenericDatumWriter<GenericRecord>(schema); > ByteArrayOutputStream os = new ByteArrayOutputStream(); > > Encoder e = EncoderFactory.get().binaryEncoder(os, null); > > writer.write(record, e); > e.flush(); > byte[] byteData = os.toByteArray(); > os.close(); > > Now, I tried deserializing the same `e7` attributes data using the same > above avro schema definition `schema2.avsc` and it also works fine, and I > am able to deserialize it properly. > GenericDatumReader<GenericRecord> r = new > GenericDatumReader<GenericRecord>(schema); > BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(byteData, null); > GenericRecord result = r.read(null, decoder); > > System.out.println(result); > System.out.println(result.get("schemaId")); > System.out.println(result.get("lmd")); > > > Now I thought, lets deserialize the same attributes data using another > avro schema that I have which is `schema1.avsc` and just extract only > `schemaId` and `lmd` from that. Below is the schema- > > { > "namespace": "com.avro.test.AvroExperiment", > "type": "record", > "name": "DEMOGRAPHIC", > "doc": "DEMOGRAPHIC data", > "fields": [ > {"name": "schemaId", "type": "int"}, > {"name": "lmd", "type": "long"} > ] > } > /** > * Deserialize the same byte data using another Avro Schema > */ > > Schema schema1 = new > Parser().parse((AvroExperiment.class.getResourceAsStream("/schema1.avsc"))); > > GenericDatumReader<GenericRecord> r1 = new > GenericDatumReader<GenericRecord>(schema1); > BinaryDecoder decoder1 = DecoderFactory.get().binaryDecoder(byteData, > null); > GenericRecord result1 = r1.read(null, decoder1); > > System.out.println(result1); > System.out.println(result1.get("schemaId")); > System.out.println(result1.get("lmd")); > But somehow the above code prints out like this which is wrong... I am > not sure what wrong I did? > > {"schemaId": 12, "lmd": -25} > 12 > -25 > It should be printing out like this.... > > {"schemaId": 20001, "lmd": 1379814280254L} > 20001 > 1379814280254L > > Can anyone help me what wrong I did? >
