Hi All,


Follow the Avro documentation at 
https://avro.apache.org/docs/1.9.2/gettingstartedjava.html, I define a schema 
like in the sample:

{"namespace": "example.avro",

"type": "record",

"name": "User",

"fields": [

     {"name": "name", "type": "string"},

     {"name": "favorite_number",  "type": ["int", "null"]},

     {"name": "favorite_color", "type": ["string", "null"]}

]

}                                                                          



Then, I create 2 User records by following below and serialize it using 
DataFileWriter

Schema schema = new Schema.Parser().parse(new File("user.avsc"));

GenericRecord user1 = new GenericData.Record(schema);

user1.put("name", "Alyssa");

user1.put("favorite_number", 256);



// Leave favorite color null

GenericRecord user2 = new GenericData.Record(schema);

user2.put("name", "Ben");

user2.put("favorite_number", 7);

user2.put("favorite_color", "red");

// Serialize user1 and user2 to disk

File file = new File("users.avro");

DatumWriter<GenericRecord> datumWriter = new 
GenericDatumWriter<GenericRecord>(schema);

DataFileWriter<GenericRecord> dataFileWriter = new 
DataFileWriter<GenericRecord>(datumWriter);

dataFileWriter.create(schema, file);

dataFileWriter.append(user1);

dataFileWriter.append(user2);

dataFileWriter.close();



I noticed that the favorite_number and favorite_color fields are UNION type. 
Thus, I expected that the serialized data should look like

"favorite_number" : { "int" : 7} and "favorite_color" : { "string" : "red" }



But when I deserialized it, I got

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

I also got expected result when using JsonEncoder and JsonDecoder



// Encoder to serialize

GenericDatumWriter<GenericRecord> writer = new 
GenericDatumWriter<GenericRecord>(schema);

ByteArrayOutputStream os = new ByteArrayOutputStream();

Encoder e = EncoderFactory.get().jsonEncoder(schema, os);

writer.write(record, e);

e.flush();

byte[] serializedPayload = os.toByteArray();



// Decoder to deserialize

DatumReader<Record> reader = new GenericDatumReader<Record>(schema);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, new 
ByteArrayInputStream(input));

GenericData.Record deserializedRecord = reader.read(null, decoder);

If I use below payload to produce message to my topic using the schema from 
Schema Registry

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}



I will get the error Expected start-union. Got VALUE_NUMBER_INT. I think this 
error is correct behavior because the payload could not be validated with given 
schema.



Can anyone tell me why there is a difference between DataFileWriter and 
JsonEncoder?



Regards,

Steven



--
This email has been checked for viruses by AVG.
https://www.avg.com

Reply via email to