We modified our code to create only one reader instance for same schema. That does seem to solve the problem. However a follow up concern is thread safety. Is GenericDatumReader thread safe? What about the writer. We also create new Writer for each serialization, and haven't seen performance issue yet. If we change the behavior of the Reader, would you suggest we also make the same change for the Writer?
Thanks. Dan -----Original Message----- From: Doug Cutting [mailto:[email protected]] Sent: Friday, June 13, 2014 1:59 PM To: [email protected] Subject: Re: 1.7.6 Slow Deserialization On Wed, Jun 11, 2014 at 1:05 PM, Han, Xiaodan <[email protected]> wrote: > org.apache.avro.specific.SpecificDatumReader.findStringClass(SpecificD > atumReader.java:80) > org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatum > Reader.java:394) The result of findStringClass are cached by getStringClass, so there should only be one call per schema used with a GenericDatumReader instance. So, with 22k calls to this method in the samples, perhaps you're either creating a new schema or a new GenericDatumReader per instance read. Could that be possible? Doug ---------------------------------------------------------------------- This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
