[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thiruvalluvan M. G. resolved AVRO-2274. --------------------------------------- Resolution: Fixed Merged the PR. Thank you [~raymie]. > Improve resolving performance when schemas don't change > ------------------------------------------------------- > > Key: AVRO-2274 > URL: https://issues.apache.org/jira/browse/AVRO-2274 > Project: Apache Avro > Issue Type: Improvement > Components: java > Reporter: Raymie Stata > Assignee: Raymie Stata > Priority: Major > > Decoding optimizations based on the observation that schemas don't change > very much. We add special-case paths to optimize the case where a > _sub_schema of the reader and the writer are the same. The specific cases > are: > * In the case of an enumeration, if the reader and writer are the same, then > we can simply return the tag written by the writer rather than "adjust" it as > if it might have been re-ordered. In fact, we can do this (directly return > the tag written by the writer) as long as the reader-schema is an "extension" > of the writer's in that it may have added new symbols but hasn't renumbered > any of the writer's symbols. Enumerations that either don't change at all or > are "extended" as defined here are the common ways to extend enumerations. > (Our tests show this optimization improves performance by about 3%.) > * When the reader and writer subschemas are both unions, resolution is > expensive: we have an outer union preceded by a "writer-union action", but > each branch of this outer union consist of union-adjust actions, which are > heavy weight. We optimize this case when the reader and writer unions are > the same: we fall back on the standard grammar used for a union, avoiding all > these adjustments. Since unions are commonly used to encode "nullable" > fields in Avro, and nullability rarely changes as a schema evolves, this > optimization should help many users. (Our tests show this optimization > improves performance by 25-30%, a significant win.) > * The "custom code" generated for reading records has to read fields in a > loop that uses a switch statement to deal with writers that may have > re-ordered fields. In most cases, however, fields have not been reordered > (esp. in more complex records with many record sub-schemas). So we've added > a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a > variant of the existing readFieldOrder. If the field order has indeed > changed, then readFieldOrderIfDiff returns the new field order, just like > readFieldOrder does. However, if the field-order hasn't changed, then > readFieldOrderIfDiff returns null. We then modified the generation of > custom-decoders for records to add a special-case path that simply reads the > record's fields in order, without incurring the overhead of the loop or the > switch statement. (Our tests show this optimization improves performance by > 8-9%, on top of the 35-40% produced by the original custom-coder > optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)