[jira] [Resolved] (AVRO-2274) Improve resolving performance when schemas don't change

Thiruvalluvan M. G. (JIRA) Tue, 27 Nov 2018 19:43:51 -0800


     [ 
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thiruvalluvan M. G. resolved AVRO-2274.
---------------------------------------
    Resolution: Fixed

Merged the PR. Thank you [~raymie].

> Improve resolving performance when schemas don't change
> -------------------------------------------------------
>
>                 Key: AVRO-2274
>                 URL: https://issues.apache.org/jira/browse/AVRO-2274
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>            Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change 
> very much.  We add special-case paths to optimize the case where a 
> _sub_schema of the reader and the writer are the same.  The specific cases 
> are:
> * In the case of an enumeration, if the reader and writer are the same, then 
> we can simply return the tag written by the writer rather than "adjust" it as 
> if it might have been re-ordered.  In fact, we can do this (directly return 
> the tag written by the writer) as long as the reader-schema is an "extension" 
> of the writer's in that it may have added new symbols but hasn't renumbered 
> any of the writer's symbols.  Enumerations that either don't change at all or 
> are "extended" as defined here are the common ways to extend enumerations.  
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is 
> expensive: we have an outer union preceded by a "writer-union action", but 
> each branch of this outer union consist of union-adjust actions, which are 
> heavy weight.  We optimize this case when the reader and writer unions are 
> the same: we fall back on the standard grammar used for a union, avoiding all 
> these adjustments.  Since unions are commonly used to encode "nullable" 
> fields in Avro, and nullability rarely changes as a schema evolves, this 
> optimization should help many users.  (Our tests show this optimization 
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a 
> loop that uses a switch statement to deal with writers that may have 
> re-ordered fields.  In most cases, however, fields have not been reordered 
> (esp. in more complex records with many record sub-schemas).  So we've added 
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a 
> variant of the existing readFieldOrder.  If the field order has indeed 
> changed, then readFieldOrderIfDiff returns the new field order, just like 
> readFieldOrder does.  However, if the field-order hasn't changed, then 
> readFieldOrderIfDiff returns null.  We then modified the generation of 
> custom-decoders for records to add a special-case path that simply reads the 
> record's fields in order, without incurring the overhead of the loop or the 
> switch statement.  (Our tests show this optimization improves performance by 
> 8-9%, on top of the 35-40% produced by the original custom-coder 
> optimization.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AVRO-2274) Improve resolving performance when schemas don't change

Reply via email to