Hello Every One, We have a process that reads data from a local file share ,serailizes and writes to HDFS in avro format. .I am just wondering if I am building the avro objects correctly. For every record that is read from the binary file we create an equivalent avro object in the below format.
Parent p = new Parent(); LOGHDR hdr = LOGHDR.newBuilder().build() MSGHDR msg = MSGHDR.newBuilder().build() p.setHdr(hdr); p.setMsg(msg); p.. p..set datumFileWriter.write(p); This avro schema has around 1800 fileds including 26 nested types within it .I did some load testing and figured that if I serialize the same object to disk performance is 6 x times faster than a constructing a new object (p.build). When a new avro object is constructed everytime using RecordBuilder.build() much of the time is spend in GenericData.deepCopy().Has any one run into a similar problem ? We are using Avro 1.8.2. Thanks, Nishanth
