Hello Every One,

We have a process that reads data from a  local file share  ,serailizes and
writes to HDFS in avro format. .I am just wondering if I am building the
avro objects correctly. For every record that  is read from the binary file
we create an equivalent avro object in the below format.

Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);
p..
p..set
datumFileWriter.write(p);

This avro schema has  around 1800 fileds including 26 nested types within
it .I did some load testing and figured that if I serialize the same object
to disk  performance is  6 x times faster  than a constructing a new object
(p.build). When a new  avro object is constructed everytime using
RecordBuilder.build()  much of the time is spend in
GenericData.deepCopy().Has any one run into a similar problem ? We are
using Avro 1.8.2.

Thanks,
Nishanth

Reply via email to