Hi All,
Follow the Avro documentation at https://avro.apache.org/docs/1.9.2/gettingstartedjava.html, I define a schema like in the sample: {"namespace": "example.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] } Then, I create 2 User records by following below and serialize it using DataFileWriter Schema schema = new Schema.Parser().parse(new File("user.avsc")); GenericRecord user1 = new GenericData.Record(schema); user1.put("name", "Alyssa"); user1.put("favorite_number", 256); // Leave favorite color null GenericRecord user2 = new GenericData.Record(schema); user2.put("name", "Ben"); user2.put("favorite_number", 7); user2.put("favorite_color", "red"); // Serialize user1 and user2 to disk File file = new File("users.avro"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, file); dataFileWriter.append(user1); dataFileWriter.append(user2); dataFileWriter.close(); I noticed that the favorite_number and favorite_color fields are UNION type. Thus, I expected that the serialized data should look like "favorite_number" : { "int" : 7} and "favorite_color" : { "string" : "red" } But when I deserialized it, I got {"name": "Alyssa", "favorite_number": 256, "favorite_color": null} {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} I also got expected result when using JsonEncoder and JsonDecoder // Encoder to serialize GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema); ByteArrayOutputStream os = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().jsonEncoder(schema, os); writer.write(record, e); e.flush(); byte[] serializedPayload = os.toByteArray(); // Decoder to deserialize DatumReader<Record> reader = new GenericDatumReader<Record>(schema); Decoder decoder = DecoderFactory.get().jsonDecoder(schema, new ByteArrayInputStream(input)); GenericData.Record deserializedRecord = reader.read(null, decoder); If I use below payload to produce message to my topic using the schema from Schema Registry {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} I will get the error Expected start-union. Got VALUE_NUMBER_INT. I think this error is correct behavior because the payload could not be validated with given schema. Can anyone tell me why there is a difference between DataFileWriter and JsonEncoder? Regards, Steven -- This email has been checked for viruses by AVG. https://www.avg.com