[jira] [Created] (AVRO-2275) Refactor schema-resolution code from grammar-generation
Raymie Stata created AVRO-2275: -- Summary: Refactor schema-resolution code from grammar-generation Key: AVRO-2275 URL: https://issues.apache.org/jira/browse/AVRO-2275 Project: Apache Avro Issue Type: Improvement Components: java Reporter: Raymie Stata Assignee: Raymie Stata In my own work to extend AVRO-2090, and also in AVRO-2247, an alternative approach optimizing decoders, we were forced to re-implement Schema resolution logic because it's currently embedded deeply in ResolvingGrammarGenerator. However, in the past the Avro community found it hard to maintain multiple implementations of the schema resolution code, as it is tedious and error-prone code. In this JIRA we've refactored the resolution code into a new class called Resolver, and have rewritten ResolvingGrammarGenerator to be a client of this class. This rewrite passes the full regression suite, including bug-for-bug compatibility with a few questionable resolutions rules, such as the "soft matching" rule for record in unions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AVRO-2274) Improve resolving performance when schemas don't change
Raymie Stata created AVRO-2274: -- Summary: Improve resolving performance when schemas don't change Key: AVRO-2274 URL: https://issues.apache.org/jira/browse/AVRO-2274 Project: Apache Avro Issue Type: Improvement Components: java Reporter: Raymie Stata Assignee: Raymie Stata Decoding optimizations based on the observation that schemas don't change very much. We add special-case paths to optimize the case where a _sub_schema of the reader and the writer are the same. The specific cases are: * In the case of an enumeration, if the reader and writer are the same, then we can simply return the tag written by the writer rather than "adjust" it as if it might have been re-ordered. In fact, we can do this (directly return the tag written by the writer) as long as the reader-schema is an "extension" of the writer's in that it may have added new symbols but hasn't renumbered any of the writer's symbols. Enumerations that either don't change at all or are "extended" as defined here are the common ways to extend enumerations. (Our tests show this optimization improves performance by about 3%.) * When the reader and writer subschemas are both unions, resolution is expensive: we have an outer union preceded by a "writer-union action", but each branch of this outer union consist of union-adjust actions, which are heavy weight. We optimize this case when the reader and writer unions are the same: we fall back on the standard grammar used for a union, avoiding all these adjustments. Since unions are commonly used to encode "nullable" fields in Avro, and nullability rarely changes as a schema evolves, this optimization should help many users. (Our tests show this optimization improves performance by 25-30%, a significant win.) * The "custom code" generated for reading records has to read fields in a loop that uses a switch statement to deal with writers that may have re-ordered fields. In most cases, however, fields have not been reordered (esp. in more complex records with many record sub-schemas). So we've added a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a variant of the existing readFieldOrder. If the field order has indeed changed, then readFieldOrderIfDiff returns the new field order, just like readFieldOrder does. However, if the field-order hasn't changed, then readFieldOrderIfDiff returns null. We then modified the generation of custom-decoders for records to add a special-case path that simply reads the record's fields in order, without incurring the overhead of the loop or the switch statement. (Our tests show this optimization improves performance by 8-9%, on top of the 35-40% produced by the original custom-coder optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2238) Update Docker image from java to openjdk
[ https://issues.apache.org/jira/browse/AVRO-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698297#comment-16698297 ] ASF GitHub Bot commented on AVRO-2238: -- Fokko commented on issue #378: [WIP] AVRO-2238 Update Dockerfile base image from java to openjdk URL: https://github.com/apache/avro/pull/378#issuecomment-441473064 Closing in favor of https://github.com/apache/avro/pull/390 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Docker image from java to openjdk > > > Key: AVRO-2238 > URL: https://issues.apache.org/jira/browse/AVRO-2238 > Project: Apache Avro > Issue Type: Improvement > Components: docker >Reporter: Fokko Driesprong >Priority: Major > > Currently the docker image to run the tests is still using java which is > deprecated: https://hub.docker.com/_/java/ > Therefore we should move to openjdk (https://hub.docker.com/_/openjdk/). > Starting with version 8, and also adding 10 and 11 to it to make sure that > Avro is compatible with future version of Java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2238) Update Docker image from java to openjdk
[ https://issues.apache.org/jira/browse/AVRO-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698298#comment-16698298 ] ASF GitHub Bot commented on AVRO-2238: -- Fokko closed pull request #378: [WIP] AVRO-2238 Update Dockerfile base image from java to openjdk URL: https://github.com/apache/avro/pull/378 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/share/docker/Dockerfile b/share/docker/Dockerfile index ec8ac34f4..49f4f9080 100644 --- a/share/docker/Dockerfile +++ b/share/docker/Dockerfile @@ -17,7 +17,7 @@ # Dockerfile for installing the necessary dependencies for building Avro. # See BUILD.txt. -FROM java:8-jdk +FROM openjdk:8 WORKDIR /root @@ -28,6 +28,10 @@ RUN curl -sL https://deb.nodesource.com/setup_4.x | bash - # Install dependencies from packages RUN apt-get -qq update && \ + apt-get -qq install software-properties-common apt-transport-https -y && \ + curl https://packages.sury.org/php/apt.gpg | apt-key add - && \ + echo "deb https://packages.sury.org/php/ $(lsb_release -sc) main" > /etc/apt/sources.list.d/php.list && \ + apt-get -qq update && \ apt-get -qq install --no-install-recommends -y \ ant \ asciidoc \ @@ -46,15 +50,14 @@ RUN apt-get -qq update && \ libglib2.0-dev \ libjansson-dev \ libsnappy-dev \ -libsnappy1 \ make \ maven \ mono-devel \ nodejs \ nunit \ perl \ -php5 \ -php5-gmp \ +php5.6 \ +php5.6-gmp \ phpunit \ python \ python-setuptools \ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Docker image from java to openjdk > > > Key: AVRO-2238 > URL: https://issues.apache.org/jira/browse/AVRO-2238 > Project: Apache Avro > Issue Type: Improvement > Components: docker >Reporter: Fokko Driesprong >Priority: Major > > Currently the docker image to run the tests is still using java which is > deprecated: https://hub.docker.com/_/java/ > Therefore we should move to openjdk (https://hub.docker.com/_/openjdk/). > Starting with version 8, and also adding 10 and 11 to it to make sure that > Avro is compatible with future version of Java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2238) Update Docker image from java to openjdk
[ https://issues.apache.org/jira/browse/AVRO-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698296#comment-16698296 ] ASF GitHub Bot commented on AVRO-2238: -- Fokko commented on issue #390: AVRO-2238 Move to up to OpenJDK image URL: https://github.com/apache/avro/pull/390#issuecomment-441473031 @tjwp I think we've updated the base image, which updated from Debian 8 (jessie) — obsolete stable release to Debian 9 (stretch) — current stable release. This also affected the version of Ruby, do you see any chance to fix the Ruby errors below? My ruby is a bit rust at the moment :-) ``` Bundle complete! 5 Gemfile dependencies, 11 gems now installed. Use `bundle info [gemname]` to see where a bundled gem is installed. /usr/bin/ruby2.3 -w -I"lib:ext:bin:test" -I"/testptch/unknown/lang/ruby/.gem/gems/rake-12.3.1/lib" "/testptch/unknown/lang/ruby/.gem/gems/rake-12.3.1/lib/rake/rake_test_loader.rb" "test/test_logical_types.rb" "test/test_help.rb" "test/test_protocol.rb" "test/test_datafile.rb" "test/test_io.rb" "test/test_socket_transport.rb" "test/test_schema_normalization.rb" "test/test_schema_compatibility.rb" "test/test_schema_validator.rb" "test/test_fingerprints.rb" "test/test_schema.rb" /testptch/unknown/lang/ruby/test/test_logical_types.rb:37: warning: ambiguous first argument; put parentheses or a space even after `-' operator /testptch/unknown/lang/ruby/lib/avro/io.rb:178: warning: assigned but unused variable - foo /testptch/unknown/lang/ruby/lib/avro/io.rb:302: warning: assigned but unused variable - block_size /testptch/unknown/lang/ruby/lib/avro/io.rb:321: warning: assigned but unused variable - block_size /testptch/unknown/lang/ruby/lib/avro/io.rb:486: warning: `&' interpreted as argument prefix /testptch/unknown/lang/ruby/lib/avro/protocol.rb:72: warning: assigned but unused variable - type_objects /testptch/unknown/lang/ruby/lib/avro/ipc.rb:198: warning: assigned but unused variable - response_metadata /testptch/unknown/lang/ruby/lib/avro/ipc.rb:78: warning: possibly useless use of a literal in void context /testptch/unknown/lang/ruby/lib/avro/ipc.rb:260: warning: assigned but unused variable - request_metadata /testptch/unknown/lang/ruby/lib/avro/ipc.rb:90: warning: method redefined; discarding old remote_protocol= /testptch/unknown/lang/ruby/lib/avro/ipc.rb:95: warning: method redefined; discarding old remote_hash= /testptch/unknown/lang/ruby/test/test_io.rb:185: warning: assigned but unused variable - schema /testptch/unknown/lang/ruby/test/test_io.rb:292: warning: assigned but unused variable - hex_encoding /testptch/unknown/lang/ruby/test/test_io.rb:317: warning: assigned but unused variable - hex_encoding /testptch/unknown/lang/ruby/test/test_io.rb:367: warning: assigned but unused variable - enc /testptch/unknown/lang/ruby/test/test_io.rb:367: warning: assigned but unused variable - dw /testptch/unknown/lang/ruby/test/test_schema.rb:280: warning: mismatched indentations at 'end' with 'def' at 277 Loaded suite /testptch/unknown/lang/ruby/.gem/gems/rake-12.3.1/lib/rake/rake_test_loader Started /testptch/unknown/lang/ruby/lib/avro/schema.rb:318: warning: too many arguments for format string .../testptch/unknown/lang/ruby/lib/avro/schema.rb:189: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:193: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:195: warning: instance variable @doc not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:189: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:193: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:195: warning: instance variable @doc not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:189: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:193: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:195: warning: instance variable @doc not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:189: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:193: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:195: warning: instance variable @doc not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:189: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:193: warning: instance variable @name not initialized /testptch/unknown/lang/ruby/lib/avro/schema.rb:195: warning: instance variable @doc not initialized
[jira] [Commented] (AVRO-2238) Update Docker image from java to openjdk
[ https://issues.apache.org/jira/browse/AVRO-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698291#comment-16698291 ] ASF GitHub Bot commented on AVRO-2238: -- Fokko opened a new pull request #390: AVRO-2238 Move to up to OpenJDK image URL: https://github.com/apache/avro/pull/390 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update Docker image from java to openjdk > > > Key: AVRO-2238 > URL: https://issues.apache.org/jira/browse/AVRO-2238 > Project: Apache Avro > Issue Type: Improvement > Components: docker >Reporter: Fokko Driesprong >Priority: Major > > Currently the docker image to run the tests is still using java which is > deprecated: https://hub.docker.com/_/java/ > Therefore we should move to openjdk (https://hub.docker.com/_/openjdk/). > Starting with version 8, and also adding 10 and 11 to it to make sure that > Avro is compatible with future version of Java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2238) Update Docker image from java to openjdk
[ https://issues.apache.org/jira/browse/AVRO-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698290#comment-16698290 ] ASF subversion and git services commented on AVRO-2238: --- Commit 638794bd7ee06ea7c28d7c06358e5d9db67da8cd in avro's branch refs/heads/AVRO-2238 from [~Fokko] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=638794b ] AVRO-2238 Update Dockerfile base image from java to openjdk > Update Docker image from java to openjdk > > > Key: AVRO-2238 > URL: https://issues.apache.org/jira/browse/AVRO-2238 > Project: Apache Avro > Issue Type: Improvement > Components: docker >Reporter: Fokko Driesprong >Priority: Major > > Currently the docker image to run the tests is still using java which is > deprecated: https://hub.docker.com/_/java/ > Therefore we should move to openjdk (https://hub.docker.com/_/openjdk/). > Starting with version 8, and also adding 10 and 11 to it to make sure that > Avro is compatible with future version of Java. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2250) Release 1.9.0
[ https://issues.apache.org/jira/browse/AVRO-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698183#comment-16698183 ] Daniel Kulp commented on AVRO-2250: --- Fokko: in 1.8.x guava was used in several places, not just tests. For 1.9, I removed the guava uses in the non-tests and you beat me to the tests. :) > Release 1.9.0 > - > > Key: AVRO-2250 > URL: https://issues.apache.org/jira/browse/AVRO-2250 > Project: Apache Avro > Issue Type: Task >Reporter: Nandor Kollar >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2250) Release 1.9.0
[ https://issues.apache.org/jira/browse/AVRO-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698177#comment-16698177 ] Fokko Driesprong commented on AVRO-2250: Thanks, Evan. Please keep in mind that Guava was only used in the tests, so there is no real security risk. Please refer to the Pull Request for more details: https://github.com/apache/avro/pull/373. Hopefully this won't block you from going to production :-) > Release 1.9.0 > - > > Key: AVRO-2250 > URL: https://issues.apache.org/jira/browse/AVRO-2250 > Project: Apache Avro > Issue Type: Task >Reporter: Nandor Kollar >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2272) SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
[ https://issues.apache.org/jira/browse/AVRO-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698169#comment-16698169 ] Thiruvalluvan M. G. commented on AVRO-2272: --- Looked at it and [commented|https://issues.apache.org/jira/browse/PARQUET-1441?focusedCommentId=16698168=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16698168] on PARQUET-1441. > SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter > > > Key: AVRO-2272 > URL: https://issues.apache.org/jira/browse/AVRO-2272 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Michael Heuer >Priority: Major > > Companion issue to https://issues.apache.org/jira/browse/PARQUET-1441, and > https://issues.apache.org/jira/browse/SPARK-25588, since those issues in > downstream projects don't seem to be getting any notice. > I've been able to create unit tests that reproduce the issue downstream in > Spark and Parquet; I would appreciate any help reproducing the issue in the > Avro codebase directly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2269) Improve usability of Perf.java
[ https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698103#comment-16698103 ] ASF GitHub Bot commented on AVRO-2269: -- rstata opened a new pull request #389: AVRO-2269 Make Perf.java more usable URL: https://github.com/apache/avro/pull/389 The class `org.apache.avro.ipc.io.Perf` is Avro's performance test suite. This JIRA aims to make it easier to use (see [JIRA ticket](https://issues.apache.org/jira/browse/AVRO-2269) for more info). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve usability of Perf.java > -- > > Key: AVRO-2269 > URL: https://issues.apache.org/jira/browse/AVRO-2269 > Project: Apache Avro > Issue Type: Test > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > The class {{org.apache.avro.ipc.io.Perf}} is Avro's performance test suite. > This JIRA aims to make it easier to use. Specifically: > * Added a file {{performance-testing.html}} with guidance on how to use the > suite > * Added script {{run-script.sh}} that uses {{Perf}} to run structured > experiments. > * Added tests for performance of resolution of unchanged unions and > enumerations, which will be subject to future optimizations. > * Tweaks to {{Perf}} for better experimentation (e.g., support for minimum as > well as average aggregation). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2269) Improve usability of Perf.java
[ https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymie Stata updated AVRO-2269: --- Description: The class {{org.apache.avro.ipc.io.Perf}} is Avro's performance test suite. This JIRA aims to make it easier to use. Specifically: * Added a file {{performance-testing.html}} with guidance on how to use the suite * Added script {{run-script.sh}} that uses {{Perf}} to run structured experiments. * Added tests for performance of resolution of unchanged unions and enumerations, which will be subject to future optimizations. * Tweaks to {{Perf}} for better experimentation (e.g., support for minimum as well as average aggregation). was: In attempting to use Perf.java to show that proposed performance changes actually improved performance, different runs of Perf.java using the exact same code base resulted variances of 5% or greater – and often 10% or greater – for about half the test cases. With variance this high within a code base, it's impossible to tell if a proposed "improved" code base indeed improves performance. I will post to the wiki and elsewhere some documents and scripts I developed to reduce this variance. This JIRA is for changes to Perf.java that reduce the variance. Specifically: * Access the {{reader}} and {{writer}} instance variables directly in the inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for reading records, rather than constructing fresh objects for each read. Both helped to significantly reduce variance for {{FooBarSpecificRecordTestWrite}}, a major target of recent performance-improvement efforts. * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} for write tests. Although this slowed writer-tests a bit, it reduced variance a lot, especially for performance tests of primitives like booleans, making it a better choice for measuring the performance-impact of code changes. * Started the timer of a test after the encoder/decoder for the test is constructed, rather than before. Helps a little. * Added the ability to output the _minimum_ runtime of a test case across multiple cycles (vs the total runtime across all cycles). This was inspired by JVMSpec, which used to use a minimum. I was able to reduce the variance of total runtime enough to obviate the need for this metric, but since it's helpful diagnostically, I left it in. > Improve usability of Perf.java > -- > > Key: AVRO-2269 > URL: https://issues.apache.org/jira/browse/AVRO-2269 > Project: Apache Avro > Issue Type: Test > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > The class {{org.apache.avro.ipc.io.Perf}} is Avro's performance test suite. > This JIRA aims to make it easier to use. Specifically: > * Added a file {{performance-testing.html}} with guidance on how to use the > suite > * Added script {{run-script.sh}} that uses {{Perf}} to run structured > experiments. > * Added tests for performance of resolution of unchanged unions and > enumerations, which will be subject to future optimizations. > * Tweaks to {{Perf}} for better experimentation (e.g., support for minimum as > well as average aggregation). -- This message was sent by Atlassian JIRA (v7.6.3#76005)