@Erin/Doug/Mika... Any thoughts on my previous question? Thanks for the help....
*Raihan Jamal* On Wed, Sep 25, 2013 at 5:42 PM, Raihan Jamal <[email protected]> wrote: > Thanks Eric. Now I have couple of questions on this- > > 1) So that means we cannot deserialize any attributes data using any other > schema? We always need to pass the schema that we have used for writing > along with any other schema that I want to use for reading purpose? Is that > right? > 2) Is there any way, I can deserialize any attributes data using any other > schema without passing actual schema that we have to serialize? > > In my example if you see, I am already storing schemaId in the avro schema > that will map to some actual schema name. So while serializing any > attributes data, we will also store the schemaId within that avro binary > encoded value, and that schemaId will represent this is the schema we have > used to serialize it. Now while deserializing that attributes, firstly we > will grab the schemaId (by deserializing it with another schema) and see > which schema we have used actually to serialize that attributes and then we > will deserialize that attributes again using the actual schema... > > > > > > > *Raihan Jamal* > > > On Wed, Sep 25, 2013 at 5:30 PM, Eric Wasserman <[email protected]>wrote: > >> Short answer. Use this constructor instead: >> >> /** Construct given writer's and reader's schema. */ >> >> public GenericDatumReader(Schema writer, Schema reader) { >> >> Longer answer: >> >> You have to give the GenericDatumReader the EXACT schema that wrote the >> bytes that you are trying to parse ("writer's schema"). >> You can *also* give it another schema you'd like to use ("reader's >> schema") that can be different. >> >> >> Try changing this line of your code: >> >> GenericDatumReader<GenericRecord> r1 = new >> GenericDatumReader<GenericRecord>(schema1); >> >> To this: >> >> GenericDatumReader<GenericRecord> r1 = new >> GenericDatumReader<GenericRecord>(schema2, schema1); // writer's schema is >> "schema2", reader's schema is "schema1" >> >> >> ------------------------------ >> *From:* Raihan Jamal <[email protected]> >> *Sent:* Wednesday, September 25, 2013 5:10 PM >> *To:* [email protected] >> *Subject:* Deserialize the attributes data using another schema give me >> wrong results >> >> I am trying to serialize one of our Attributes Daya using Apache Avro >> Schema. Here the attribute name is `e7` and the schema that I am using to >> serialize it is `schema2.avsc` which is below. >> >> { >> "namespace": "com.avro.test.AvroExperiment", >> "type": "record", >> "name": "DEMOGRAPHIC", >> "doc": "DEMOGRAPHIC data", >> "fields": [ >> {"name": "dob", "type": "string"}, >> {"name": "gndr", "type": "string"}, >> {"name": "occupation", "type": "string"}, >> {"name": "mrtlStatus", "type": "string"}, >> {"name": "numChldrn", "type": "int"}, >> {"name": "estInc", "type": "string"}, >> {"name": "schemaId", "type": "int"}, >> {"name": "lmd", "type": "long"} >> ] >> } >> >> Below is the code that I am using to serialize the attribute `e7` using >> above avro `schema2.avsc`. And I am able to serialize it properly and it >> works fine... >> Schema schema = new >> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema2.avsc"))); >> GenericRecord record = new GenericData.Record(schema); >> record.put("dob", "161913600000"); >> record.put("gndr", "f"); >> record.put("occupation", "doctor"); >> record.put("mrtlStatus", "single"); >> record.put("numChldrn", 3); >> record.put("estInc", "50000"); >> record.put("schemaId", 20001); >> record.put("lmd", 1379814280254L); >> >> GenericDatumWriter<GenericRecord> writer = new >> GenericDatumWriter<GenericRecord>(schema); >> ByteArrayOutputStream os = new ByteArrayOutputStream(); >> >> Encoder e = EncoderFactory.get().binaryEncoder(os, null); >> >> writer.write(record, e); >> e.flush(); >> byte[] byteData = os.toByteArray(); >> os.close(); >> >> Now, I tried deserializing the same `e7` attributes data using the same >> above avro schema definition `schema2.avsc` and it also works fine, and I >> am able to deserialize it properly. >> GenericDatumReader<GenericRecord> r = new >> GenericDatumReader<GenericRecord>(schema); >> BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(byteData, >> null); >> GenericRecord result = r.read(null, decoder); >> >> System.out.println(result); >> System.out.println(result.get("schemaId")); >> System.out.println(result.get("lmd")); >> >> >> Now I thought, lets deserialize the same attributes data using another >> avro schema that I have which is `schema1.avsc` and just extract only >> `schemaId` and `lmd` from that. Below is the schema- >> >> { >> "namespace": "com.avro.test.AvroExperiment", >> "type": "record", >> "name": "DEMOGRAPHIC", >> "doc": "DEMOGRAPHIC data", >> "fields": [ >> {"name": "schemaId", "type": "int"}, >> {"name": "lmd", "type": "long"} >> ] >> } >> /** >> * Deserialize the same byte data using another Avro Schema >> */ >> >> Schema schema1 = new >> Parser().parse((AvroExperiment.class.getResourceAsStream("/schema1.avsc"))); >> >> GenericDatumReader<GenericRecord> r1 = new >> GenericDatumReader<GenericRecord>(schema1); >> BinaryDecoder decoder1 = DecoderFactory.get().binaryDecoder(byteData, >> null); >> GenericRecord result1 = r1.read(null, decoder1); >> >> System.out.println(result1); >> System.out.println(result1.get("schemaId")); >> System.out.println(result1.get("lmd")); >> But somehow the above code prints out like this which is wrong... I am >> not sure what wrong I did? >> >> {"schemaId": 12, "lmd": -25} >> 12 >> -25 >> It should be printing out like this.... >> >> {"schemaId": 20001, "lmd": 1379814280254L} >> 20001 >> 1379814280254L >> >> Can anyone help me what wrong I did? >> > >
