We modified our code to create only one reader instance for same schema. That 
does seem to solve the problem. However a follow up concern is thread safety. 
Is GenericDatumReader thread safe? What about the writer. We also create new 
Writer for each serialization, and haven't seen performance issue yet. If we 
change the behavior of the Reader, would you suggest we also make the same 
change for the Writer?

Thanks.
Dan 


-----Original Message-----
From: Doug Cutting [mailto:[email protected]] 
Sent: Friday, June 13, 2014 1:59 PM
To: [email protected]
Subject: Re: 1.7.6 Slow Deserialization

On Wed, Jun 11, 2014 at 1:05 PM, Han, Xiaodan <[email protected]> wrote:
> org.apache.avro.specific.SpecificDatumReader.findStringClass(SpecificD
> atumReader.java:80)
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatum
> Reader.java:394)

The result of findStringClass are cached by getStringClass, so there should 
only be one call per schema used with a GenericDatumReader instance.  So, with 
22k calls to this method in the samples, perhaps you're either creating a new 
schema or a new GenericDatumReader per instance read.  Could that be possible?

Doug

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may 
contain information that is privileged, confidential and/or proprietary and 
subject to important terms and conditions available at 
http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended 
recipient, please delete this message.

Reply via email to