That's great Gary. Thanks for the follow up. Dave
From: [email protected] [mailto:[email protected]] On Behalf Of Gary Steelman Sent: Tuesday, February 18, 2014 5:15 PM To: Gary Steelman Cc: [email protected] Subject: Re: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords Hey all, I've adapted Dave's solution to serialize to/from byte[] rather than JSON. Thanks a lot! The two methods are below: @SuppressWarnings("unchecked") public static <T> byte[] avroSerialize(Class<T> clazz, Object object) { byte[] ret = null; try { if (object == null || !(object instanceof SpecificRecord)) { return null; } T record = (T) object; ByteArrayOutputStream out = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().directBinaryEncoder(out, null); SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz); w.write(record, e); e.flush(); ret = out.toByteArray(); } catch (IOException e) { LOG.debug(e); } return ret; } public static <T> T avroDeserialize(byte[] avroBytes, Class<T> clazz, Schema schema) { T ret = null; try { ByteArrayInputStream in = new ByteArrayInputStream(avroBytes); Decoder d = DecoderFactory.get().directBinaryDecoder(in, null); SpecificDatumReader<T> reader = new SpecificDatumReader<T>(clazz); ret = reader.read(null, d); } catch (IOException e) { LOG.debug(e); } return ret; } And they're called like so: MyObject x = new MyObject(); byte[] avroBytes = avroSerialize(x.getClass(), x); MyObject y = avroDeserialize(avroBytes, MyObject.class, MyObject.SCHEMA$); Thanks, Gary On Tue, Feb 18, 2014 at 6:49 PM, Gary Steelman <[email protected]<mailto:[email protected]>> wrote: Thank you Dave, I appreciate it. I'll give those a shot and let you know how it goes. -Gary On Feb 18, 2014 6:45 PM, "Dave McAlpin" <[email protected]<mailto:[email protected]>> wrote: Here are some utility functions we've used for serialization to and from JSON. Something similar should work for binary. public <T> String avroEncodeAsJson(Class<T> clazz, Object object) { String avroEncodedJson = null; try { if (object == null || !(object instanceof SpecificRecord)) { return null; } T record = (T) object; Schema schema = ((SpecificRecord) record).getSchema(); ByteArrayOutputStream out = new ByteArrayOutputStream(); Encoder e = EncoderFactory.get().jsonEncoder(schema, out); SpecificDatumWriter<T> w = new SpecificDatumWriter<T>(clazz); w.write(record, e); e.flush(); avroEncodedJson = new String(out.toByteArray()); } catch (IOException e) { e.printStackTrace(); } return avroEncodedJson; } public <T> T jsonDecodeToAvro(String inputString, Class<T> className, Schema schema) { T returnObject = null; try { JsonDecoder jsonDecoder = DecoderFactory.get().jsonDecoder(schema, inputString); SpecificDatumReader<T> reader = new SpecificDatumReader<T>(className); returnObject = reader.read(null, jsonDecoder); } catch (IOException e) { e.printStackTrace(); } return returnObject; } Dave From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Gary Steelman Sent: Tuesday, February 18, 2014 4:21 PM To: [email protected]<mailto:[email protected]> Subject: General-Purpose Serialization and Deserialization for Avro-Generated SpecificRecords Hi all, Here's my use case: I've got a bunch of different Java objects generated from Avro schema files. So the class definition headers look something like this: public class MyObject extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord. I've got many other types than MyObject too. I need to write a method which can serialize (from MyObject or another class to byte[]) and deserialize (from byte[] to MyObject or another class) in memory (not writing to disk). I couldn't figure out how to write one method to handle it for SpecificRecord, so I tired serializing/deserializing these things as GenericRecord instead: public static byte[] serializeFromAvro(GenericRecord gr) { try { DatumWriter<GenericRecord> writer2 = new GenericDatumWriter<GenericRecord>(gr.getSchema()); ByteArrayOutputStream bao2 = new ByteArrayOutputStream(); BinaryEncoder encoder2 = EncoderFactory.get().directBinaryEncoder(bao2, null); writer2.write(gr, encoder2); byte[] avroBytes2 = bao2.toByteArray(); return avroBytes2; } catch (IOException e) { LOG.debug(e); return null; } } // Here I use a DataType enum and the AvroSchemaFactory to quickly retrieve a Schema object for a supported DataType. public static GenericRecord deserializeFromAvro(byte[] avroBytes, DataType dataType) { try { Schema schema = AvroSchemaFactory.getInstance().getSchema(dataType); DatumReader<GenericRecord> reader2 = new GenericDatumReader<GenericRecord>(schema); ByteArrayInputStream bai2 = new ByteArrayInputStream(avroBytes); BinaryDecoder decoder2 = DecoderFactory.get().directBinaryDecoder(bai2, null); GenericRecord gr2 = reader2.read(null, decoder2); return gr2; } catch (Exception e) { LOG.debug(e); return null; } } And use them like such: // Remember MyObject is the SpecificRecord implementing class. MyObject x = new MyObject(); byte[] avroBytes = serializeFromAvro(x); MyObject x2 = (MyObject) deserializeFromAvro(avroBytes, DataType.MyObject); Which results in this: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to datatypes.generated.avro.MyObject Is there an easier way to achieve my use case, or some way I can fix my methods to allow the sort of behavior I want? Thanks, Gary
