[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader
[ https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701425#comment-16701425 ] ASF GitHub Bot commented on AVRO-2247: -- rstata commented on issue #391: AVRO-2247 - improved java reading performance with new reader URL: https://github.com/apache/avro/pull/391#issuecomment-442330623 I've run your code against `Perf.java` and uploaded the [results here](https://github.com/apache/avro/files/2623075/AVRO-2247-Perf-results-11-27.pdf). This report contains two sets of results: * The "avro-2247 (calibration)" column presents the results of running the 2247 branch against itself three different times. These results are useful for understanding where the Perf.java benchmark tends to have a lot of internal variability. As an example, the BooleanRead/Write shows a lot of natural variability, which is something I've notice in a lot of my previous performance testing. * The "avro-2274 (w/ custom coders) vs" column presents the result of running three different treatments against my avro-2274 branch. The three sub-columns here are as follows: "master" is the Apache Avro master branch (just prior to avro-2274 being merged into it); "2247 (off)" branch is the 2247 code with fast-coder turned off; "2247 (on)" is the 2247 branch with coders turned on. The last sub-column of "avro-2274 (...) vs" results is the more relevant. What we see here are a large number of record-related cases showing speedups of 20-30% and even more. This is very promising. I am currently running the JMH-based benchmarks. These do _not_ have an (obvious) mechanism for comparing the "before/after" performance of your proposed changes, but I will be interested in seeing if they do better in reducing the variance between runs. I haven't inspected your code yet. I'll do that as well, and offer some opinions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Java reading performance with a new reader > -- > > Key: AVRO-2247 > URL: https://issues.apache.org/jira/browse/AVRO-2247 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Martin Jubelgas >Priority: Major > Fix For: 1.9.0 > > Attachments: Perf-Comparison.md > > > Complementary to AVRO-2090, I have been working on decoding of Avro objects > in Java and am suggesting a new implementation of a DatumReader that improves > read performance for both generic and specific records by approximately 20% > (and even more in cases of nested objects with defaults, a case I encounter a > lot in practical use). > Key concept is to create a detailed execution plan once at DatumReader. This > execution plan contains all required defaulting/lookup values so they need > not be looked up during object traversal while reading. > The reader implementation can be enabled and disabled per GenericData > instance. The system default is set via the system variable > "org.apache.avro.fastread" (defaults to "false"). > Attached a performance comparison of the existing implementation with the > proposed one. Will open a pull request with respective code in a bit (not > including interoperability with the optimizations of AVRO-2090 yet). Please > let me know your opinion of whether this is worth pursuing further. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2275) Refactor schema-resolution code from grammar-generation
[ https://issues.apache.org/jira/browse/AVRO-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701386#comment-16701386 ] ASF GitHub Bot commented on AVRO-2275: -- rstata opened a new pull request #395: AVRO-2275 Refactor schema-resolution code from grammar-generation URL: https://github.com/apache/avro/pull/395 Efforts to improve performance by code-generation and other means have been hampered by the fact that our schema-resolution logic is embedded in resolving-grammar-generation logic (see [AVRO-2275](https://issues.apache.org/jira/browse/AVRO-2275)). This patch factors the resolution logic out from the grammar-generation logic, so the resolution logic can be more easily reused. See the design document included in this patch for more information. This patch consists of the following pieces: * A design/user-guide document (`refactoring-resolution.md`). * Core changes: a new file, `Resolver.java`, containing the extracted resolution logic, and a rewrite of `ResolvingGrammarGenerator.java` based on the new `Resolver.java`. * Changes to resolution-related tests. These changes do not change the tests themselves, but rather output more diagnostic information upon failure to help developers resolve bugs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor schema-resolution code from grammar-generation > --- > > Key: AVRO-2275 > URL: https://issues.apache.org/jira/browse/AVRO-2275 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > In my own work to extend AVRO-2090, and also in AVRO-2247, an alternative > approach optimizing decoders, we were forced to re-implement Schema > resolution logic because it's currently embedded deeply in > ResolvingGrammarGenerator. However, in the past the Avro community found it > hard to maintain multiple implementations of the schema resolution code, as > it is tedious and error-prone code. > In this JIRA we've refactored the resolution code into a new class called > Resolver, and have rewritten ResolvingGrammarGenerator to be a client of this > class. This rewrite passes the full regression suite, including bug-for-bug > compatibility with a few questionable resolutions rules, such as the "soft > matching" rule for record in unions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change
[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701362#comment-16701362 ] ASF subversion and git services commented on AVRO-2274: --- Commit 6eb25603b96169bf8d77269176218c63c181e9f4 in avro's branch refs/heads/master from [~raymie] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=6eb2560 ] AVRO-2274 Improve resolving performance when schemas don't change. (#393) * AVRO-2274 Improve resolving performance when schemas don't change. * AVRO-2274 Break out of field-no-reorder loop as early as possible. > Improve resolving performance when schemas don't change > --- > > Key: AVRO-2274 > URL: https://issues.apache.org/jira/browse/AVRO-2274 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > Decoding optimizations based on the observation that schemas don't change > very much. We add special-case paths to optimize the case where a > _sub_schema of the reader and the writer are the same. The specific cases > are: > * In the case of an enumeration, if the reader and writer are the same, then > we can simply return the tag written by the writer rather than "adjust" it as > if it might have been re-ordered. In fact, we can do this (directly return > the tag written by the writer) as long as the reader-schema is an "extension" > of the writer's in that it may have added new symbols but hasn't renumbered > any of the writer's symbols. Enumerations that either don't change at all or > are "extended" as defined here are the common ways to extend enumerations. > (Our tests show this optimization improves performance by about 3%.) > * When the reader and writer subschemas are both unions, resolution is > expensive: we have an outer union preceded by a "writer-union action", but > each branch of this outer union consist of union-adjust actions, which are > heavy weight. We optimize this case when the reader and writer unions are > the same: we fall back on the standard grammar used for a union, avoiding all > these adjustments. Since unions are commonly used to encode "nullable" > fields in Avro, and nullability rarely changes as a schema evolves, this > optimization should help many users. (Our tests show this optimization > improves performance by 25-30%, a significant win.) > * The "custom code" generated for reading records has to read fields in a > loop that uses a switch statement to deal with writers that may have > re-ordered fields. In most cases, however, fields have not been reordered > (esp. in more complex records with many record sub-schemas). So we've added > a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a > variant of the existing readFieldOrder. If the field order has indeed > changed, then readFieldOrderIfDiff returns the new field order, just like > readFieldOrder does. However, if the field-order hasn't changed, then > readFieldOrderIfDiff returns null. We then modified the generation of > custom-decoders for records to add a special-case path that simply reads the > record's fields in order, without incurring the overhead of the loop or the > switch statement. (Our tests show this optimization improves performance by > 8-9%, on top of the 35-40% produced by the original custom-coder > optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AVRO-2274) Improve resolving performance when schemas don't change
[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvalluvan M. G. resolved AVRO-2274. --- Resolution: Fixed Merged the PR. Thank you [~raymie]. > Improve resolving performance when schemas don't change > --- > > Key: AVRO-2274 > URL: https://issues.apache.org/jira/browse/AVRO-2274 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > Decoding optimizations based on the observation that schemas don't change > very much. We add special-case paths to optimize the case where a > _sub_schema of the reader and the writer are the same. The specific cases > are: > * In the case of an enumeration, if the reader and writer are the same, then > we can simply return the tag written by the writer rather than "adjust" it as > if it might have been re-ordered. In fact, we can do this (directly return > the tag written by the writer) as long as the reader-schema is an "extension" > of the writer's in that it may have added new symbols but hasn't renumbered > any of the writer's symbols. Enumerations that either don't change at all or > are "extended" as defined here are the common ways to extend enumerations. > (Our tests show this optimization improves performance by about 3%.) > * When the reader and writer subschemas are both unions, resolution is > expensive: we have an outer union preceded by a "writer-union action", but > each branch of this outer union consist of union-adjust actions, which are > heavy weight. We optimize this case when the reader and writer unions are > the same: we fall back on the standard grammar used for a union, avoiding all > these adjustments. Since unions are commonly used to encode "nullable" > fields in Avro, and nullability rarely changes as a schema evolves, this > optimization should help many users. (Our tests show this optimization > improves performance by 25-30%, a significant win.) > * The "custom code" generated for reading records has to read fields in a > loop that uses a switch statement to deal with writers that may have > re-ordered fields. In most cases, however, fields have not been reordered > (esp. in more complex records with many record sub-schemas). So we've added > a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a > variant of the existing readFieldOrder. If the field order has indeed > changed, then readFieldOrderIfDiff returns the new field order, just like > readFieldOrder does. However, if the field-order hasn't changed, then > readFieldOrderIfDiff returns null. We then modified the generation of > custom-decoders for records to add a special-case path that simply reads the > record's fields in order, without incurring the overhead of the loop or the > switch statement. (Our tests show this optimization improves performance by > 8-9%, on top of the 35-40% produced by the original custom-coder > optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change
[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701361#comment-16701361 ] ASF subversion and git services commented on AVRO-2274: --- Commit 6eb25603b96169bf8d77269176218c63c181e9f4 in avro's branch refs/heads/master from [~raymie] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=6eb2560 ] AVRO-2274 Improve resolving performance when schemas don't change. (#393) * AVRO-2274 Improve resolving performance when schemas don't change. * AVRO-2274 Break out of field-no-reorder loop as early as possible. > Improve resolving performance when schemas don't change > --- > > Key: AVRO-2274 > URL: https://issues.apache.org/jira/browse/AVRO-2274 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > Decoding optimizations based on the observation that schemas don't change > very much. We add special-case paths to optimize the case where a > _sub_schema of the reader and the writer are the same. The specific cases > are: > * In the case of an enumeration, if the reader and writer are the same, then > we can simply return the tag written by the writer rather than "adjust" it as > if it might have been re-ordered. In fact, we can do this (directly return > the tag written by the writer) as long as the reader-schema is an "extension" > of the writer's in that it may have added new symbols but hasn't renumbered > any of the writer's symbols. Enumerations that either don't change at all or > are "extended" as defined here are the common ways to extend enumerations. > (Our tests show this optimization improves performance by about 3%.) > * When the reader and writer subschemas are both unions, resolution is > expensive: we have an outer union preceded by a "writer-union action", but > each branch of this outer union consist of union-adjust actions, which are > heavy weight. We optimize this case when the reader and writer unions are > the same: we fall back on the standard grammar used for a union, avoiding all > these adjustments. Since unions are commonly used to encode "nullable" > fields in Avro, and nullability rarely changes as a schema evolves, this > optimization should help many users. (Our tests show this optimization > improves performance by 25-30%, a significant win.) > * The "custom code" generated for reading records has to read fields in a > loop that uses a switch statement to deal with writers that may have > re-ordered fields. In most cases, however, fields have not been reordered > (esp. in more complex records with many record sub-schemas). So we've added > a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a > variant of the existing readFieldOrder. If the field order has indeed > changed, then readFieldOrderIfDiff returns the new field order, just like > readFieldOrder does. However, if the field-order hasn't changed, then > readFieldOrderIfDiff returns null. We then modified the generation of > custom-decoders for records to add a special-case path that simply reads the > record's fields in order, without incurring the overhead of the loop or the > switch statement. (Our tests show this optimization improves performance by > 8-9%, on top of the 35-40% produced by the original custom-coder > optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change
[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701363#comment-16701363 ] ASF subversion and git services commented on AVRO-2274: --- Commit 6eb25603b96169bf8d77269176218c63c181e9f4 in avro's branch refs/heads/master from [~raymie] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=6eb2560 ] AVRO-2274 Improve resolving performance when schemas don't change. (#393) * AVRO-2274 Improve resolving performance when schemas don't change. * AVRO-2274 Break out of field-no-reorder loop as early as possible. > Improve resolving performance when schemas don't change > --- > > Key: AVRO-2274 > URL: https://issues.apache.org/jira/browse/AVRO-2274 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Raymie Stata >Assignee: Raymie Stata >Priority: Major > > Decoding optimizations based on the observation that schemas don't change > very much. We add special-case paths to optimize the case where a > _sub_schema of the reader and the writer are the same. The specific cases > are: > * In the case of an enumeration, if the reader and writer are the same, then > we can simply return the tag written by the writer rather than "adjust" it as > if it might have been re-ordered. In fact, we can do this (directly return > the tag written by the writer) as long as the reader-schema is an "extension" > of the writer's in that it may have added new symbols but hasn't renumbered > any of the writer's symbols. Enumerations that either don't change at all or > are "extended" as defined here are the common ways to extend enumerations. > (Our tests show this optimization improves performance by about 3%.) > * When the reader and writer subschemas are both unions, resolution is > expensive: we have an outer union preceded by a "writer-union action", but > each branch of this outer union consist of union-adjust actions, which are > heavy weight. We optimize this case when the reader and writer unions are > the same: we fall back on the standard grammar used for a union, avoiding all > these adjustments. Since unions are commonly used to encode "nullable" > fields in Avro, and nullability rarely changes as a schema evolves, this > optimization should help many users. (Our tests show this optimization > improves performance by 25-30%, a significant win.) > * The "custom code" generated for reading records has to read fields in a > loop that uses a switch statement to deal with writers that may have > re-ordered fields. In most cases, however, fields have not been reordered > (esp. in more complex records with many record sub-schemas). So we've added > a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a > variant of the existing readFieldOrder. If the field order has indeed > changed, then readFieldOrderIfDiff returns the new field order, just like > readFieldOrder does. However, if the field-order hasn't changed, then > readFieldOrderIfDiff returns null. We then modified the generation of > custom-decoders for records to add a special-case path that simply reads the > record's fields in order, without incurring the overhead of the loop or the > switch statement. (Our tests show this optimization improves performance by > 8-9%, on top of the 35-40% produced by the original custom-coder > optimization.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change
[ https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701360#comment-16701360 ] ASF GitHub Bot commented on AVRO-2274: -- thiru-apache closed pull request #393: AVRO-2274 Improve resolving performance when schemas don't change. URL: https://github.com/apache/avro/pull/393 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java index 8f1f6a95b..45ff922fd 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java +++ b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java @@ -129,6 +129,19 @@ public static Object resolve(Schema writer, Schema reader) fields; } + /** + * Same as {@link readFieldOrder} except that it returns + * null if there was no reordering of fields, i.e., if the + * correct thing for the reader to do is to read (all) of its fields + * in the order specified by its own schema (useful for + * optimizations). + */ + public final Schema.Field[] readFieldOrderIfDiff() throws IOException { +Symbol.FieldOrderAction top + = (Symbol.FieldOrderAction) parser.advance(Symbol.FIELD_ACTION); +return (top.noReorder ? null : top.fields); + } + /** * Consume any more data that has been written by the writer but not * needed by the reader so that the the underlying decoder is in proper @@ -252,6 +265,7 @@ public int readEnum() throws IOException { parser.advance(Symbol.ENUM); Symbol.EnumAdjustAction top = (Symbol.EnumAdjustAction) parser.popSymbol(); int n = in.readEnum(); +if (top.noAdjustments) return n; Object o = top.adjustments[n]; if (o instanceof Integer) { return ((Integer) o).intValue(); @@ -263,9 +277,17 @@ public int readEnum() throws IOException { @Override public int readIndex() throws IOException { parser.advance(Symbol.UNION); -Symbol.UnionAdjustAction top = (Symbol.UnionAdjustAction) parser.popSymbol(); -parser.pushSymbol(top.symToParse); -return top.rindex; +Symbol top = parser.popSymbol(); +int result; +if (top instanceof Symbol.UnionAdjustAction) { + result = ((Symbol.UnionAdjustAction) top).rindex; + top = ((Symbol.UnionAdjustAction) top).symToParse; +} else { + result = in.readIndex(); + top = ((Symbol.Alternative) top).getSymbol(result); +} +parser.pushSymbol(top); +return result; } @Override diff --git a/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java b/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java index 71978824b..61073dce8 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java +++ b/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java @@ -76,8 +76,8 @@ public final Symbol generate(Schema writer, Schema reader) * @return The start symbol for the resolving grammar * @throws IOException */ - public Symbol generate(Schema writer, Schema reader, -Map seen) throws IOException + private Symbol generate(Schema writer, Schema reader, Map seen) +throws IOException { final Schema.Type writerType = writer.getType(); final Schema.Type readerType = reader.getType(); @@ -204,6 +204,9 @@ public Symbol generate(Schema writer, Schema reader, private Symbol resolveUnion(Schema writer, Schema reader, Map seen) throws IOException { +boolean needsAdj = ! unionEquiv(writer, reader, new HashMap<>()); +List alts2 = (!needsAdj ? reader.getTypes() : null); + List alts = writer.getTypes(); final int size = alts.size(); Symbol[] symbols = new Symbol[size]; @@ -215,12 +218,72 @@ private Symbol resolveUnion(Schema writer, Schema reader, */ int i = 0; for (Schema w : alts) { - symbols[i] = generate(w, reader, seen); + symbols[i] = generate(w, (needsAdj ? reader : alts2.get(i)), seen); labels[i] = w.getFullName(); i++; } +if (! needsAdj) + return Symbol.seq(Symbol.alt(symbols, labels), Symbol.UNION); return Symbol.seq(Symbol.alt(symbols, labels), - Symbol.writerUnionAction()); + Symbol.WRITER_UNION_ACTION); + } + + private static boolean unionEquiv(Schema w, Schema r, Map seen) { +Schema.Type wt = w.getType(); +if (wt != r.getType()) return false; +if ((wt == Schema.Type.RECORD || wt == Schema.Type.FIXED || wt == Schema.Type.ENUM
[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum
[ https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701271#comment-16701271 ] ASF GitHub Bot commented on AVRO-2276: -- thiru-apache commented on a change in pull request #394: AVRO-2276: Escape Map keys in GenericData.toString to generate valid JSON URL: https://github.com/apache/avro/pull/394#discussion_r236911408 ## File path: lang/java/grpc/pom.xml ## @@ -21,8 +21,8 @@ 4.0.0 -org.apache.avro avro-parent +org.apache.avro 1.9.0-SNAPSHOT Review comment: The convention is to have `` before ``. Has some tool changed the order here accidentally? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > GenericData.toString does not always generate valid JSON for Map datum > -- > > Key: AVRO-2276 > URL: https://issues.apache.org/jira/browse/AVRO-2276 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > Avro represents data as json internally so it requires to escape the keys of > the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259 > I discover this while running a build on windows because of '\' characters. > But it can be easily reproduced on linux creating a file/dir with backspaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum
[ https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701272#comment-16701272 ] ASF GitHub Bot commented on AVRO-2276: -- thiru-apache commented on a change in pull request #394: AVRO-2276: Escape Map keys in GenericData.toString to generate valid JSON URL: https://github.com/apache/avro/pull/394#discussion_r236912579 ## File path: lang/java/grpc/pom.xml ## @@ -87,6 +92,12 @@ test + + io.netty + netty-codec-http2 + ${netty-codec-http2.version} + test + Review comment: The change does not seem to do anything with grpc. Did you patch this pom and the main pom by mistake? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > GenericData.toString does not always generate valid JSON for Map datum > -- > > Key: AVRO-2276 > URL: https://issues.apache.org/jira/browse/AVRO-2276 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > Avro represents data as json internally so it requires to escape the keys of > the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259 > I discover this while running a build on windows because of '\' characters. > But it can be easily reproduced on linux creating a file/dir with backspaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2278) GenericData.Record field getter not correct
[ https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Farkas updated AVRO-2278: Summary: GenericData.Record field getter not correct (was: GenericData.Record field getter no correct) > GenericData.Record field getter not correct > --- > > Key: AVRO-2278 > URL: https://issues.apache.org/jira/browse/AVRO-2278 > Project: Apache Avro > Issue Type: Bug >Affects Versions: 1.8.2 >Reporter: Zoltan Farkas >Priority: Major > > Currently the get field implementation is not correct in GenericData.Record: > at: > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209 > {code} >@Override public Object get(String key) { > Field field = schema.getField(key); > if (field == null) return null; > return values[field.pos()]; > } > {code} > The method returns null when a field is not present, making it impossible to > distinguish between: > field value = null > and > field does not exist. > A more "correct" implementation would be: > {code} > @Override public Object get(String key) { > Field field = schema.getField(key); > if (field == null) { > throw new IllegalArgumentException("Invalid field " + key); > } > return values[field.pos()]; > } > {code} > this will make the behavior consistent with put which will throw a exception > when setting a non existent field. > when I make this change in my fork, some bugs in unit tests showed up -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AVRO-2278) GenericData.Record field getter no correct
Zoltan Farkas created AVRO-2278: --- Summary: GenericData.Record field getter no correct Key: AVRO-2278 URL: https://issues.apache.org/jira/browse/AVRO-2278 Project: Apache Avro Issue Type: Bug Affects Versions: 1.8.2 Reporter: Zoltan Farkas Currently the get field implementation is not correct in GenericData.Record: at: https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209 {code} @Override public Object get(String key) { Field field = schema.getField(key); if (field == null) return null; return values[field.pos()]; } {code} The method returns null when a field is not present, making it impossible to distinguish between: field value = null and field does not exist. A more "correct" implementation would be: {code} @Override public Object get(String key) { Field field = schema.getField(key); if (field == null) { throw new IllegalArgumentException("Invalid field " + key); } return values[field.pos()]; } {code} this will make the behavior consistent with put which will throw a exception when setting a non existent field. when I make this change in my fork, some bugs in unit tests showed up -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum
[ https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated AVRO-2276: --- Affects Version/s: (was: 1.8.2) 1.9.0 > GenericData.toString does not always generate valid JSON for Map datum > -- > > Key: AVRO-2276 > URL: https://issues.apache.org/jira/browse/AVRO-2276 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.9.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > Avro represents data as json internally so it requires to escape the keys of > the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259 > I discover this while running a build on windows because of '\' characters. > But it can be easily reproduced on linux creating a file/dir with backspaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AVRO-2277) clean up Ruby warnings
Tim Perkins created AVRO-2277: - Summary: clean up Ruby warnings Key: AVRO-2277 URL: https://issues.apache.org/jira/browse/AVRO-2277 Project: Apache Avro Issue Type: Improvement Components: ruby Reporter: Tim Perkins Assignee: Tim Perkins Fix For: 1.9.0 Running tests for the Ruby implementation generates a lot of warnings and makes it unclear that the Ruby tests are passing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2277) clean up Ruby warnings
[ https://issues.apache.org/jira/browse/AVRO-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Perkins updated AVRO-2277: -- Status: Patch Available (was: Open) https://github.com/apache/avro/pull/392 > clean up Ruby warnings > -- > > Key: AVRO-2277 > URL: https://issues.apache.org/jira/browse/AVRO-2277 > Project: Apache Avro > Issue Type: Improvement > Components: ruby >Reporter: Tim Perkins >Assignee: Tim Perkins >Priority: Minor > Fix For: 1.9.0 > > > Running tests for the Ruby implementation generates a lot of warnings and > makes it unclear that the Ruby tests are passing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum
[ https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700734#comment-16700734 ] ASF GitHub Bot commented on AVRO-2276: -- iemejia opened a new pull request #394: AVRO-2276: Escape Map keys in GenericData.toString to generate valid JSON URL: https://github.com/apache/avro/pull/394 The extra changes in the pom file is to fix for a maven RAT run on windows. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > GenericData.toString does not always generate valid JSON for Map datum > -- > > Key: AVRO-2276 > URL: https://issues.apache.org/jira/browse/AVRO-2276 > Project: Apache Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.2 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > Avro represents data as json internally so it requires to escape the keys of > the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259 > I discover this while running a build on windows because of '\' characters. > But it can be easily reproduced on linux creating a file/dir with backspaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum
Ismaël Mejía created AVRO-2276: -- Summary: GenericData.toString does not always generate valid JSON for Map datum Key: AVRO-2276 URL: https://issues.apache.org/jira/browse/AVRO-2276 Project: Apache Avro Issue Type: Bug Components: java Affects Versions: 1.8.2 Reporter: Ismaël Mejía Assignee: Ismaël Mejía Avro represents data as json internally so it requires to escape the keys of the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259 I discover this while running a build on windows because of '\' characters. But it can be easily reproduced on linux creating a file/dir with backspaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2273) Release 1.8.3
[ https://issues.apache.org/jira/browse/AVRO-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700244#comment-16700244 ] Ismaël Mejía commented on AVRO-2273: What are the goals here? Apart of minor fixes? The security issues have not been backported yet. Maybe we should just encourage people to move to 1.9.0 better, no? Otherwise probably we should do the list of issues/PRs to backport but seems like a lot of extra work vs jumping straight to 1.9.x. > Release 1.8.3 > - > > Key: AVRO-2273 > URL: https://issues.apache.org/jira/browse/AVRO-2273 > Project: Apache Avro > Issue Type: Task >Reporter: Thiruvalluvan M. G. >Priority: Major > Fix For: 1.8.3 > > > This ticket is for releasing Avro 1.8.3 and discussing any topics related to > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid
[ https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated AVRO-2142: --- Resolution: Fixed Fix Version/s: 1.9.0 Status: Resolved (was: Patch Available) > SchemaBuilder Java documentation code snippet is not valid > -- > > Key: AVRO-2142 > URL: https://issues.apache.org/jira/browse/AVRO-2142 > Project: Apache Avro > Issue Type: Improvement > Components: doc, java >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 1.9.0 > > > The code snippet in SchemaBuilder is invalid, it has invalid quotes and > misses one call in the builder chain. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid
[ https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700224#comment-16700224 ] ASF GitHub Bot commented on AVRO-2142: -- iemejia closed pull request #282: AVRO-2142: Fix SchemaBuilder javadoc code snippet URL: https://github.com/apache/avro/pull/282 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java b/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java index cdc43e032..8ebe45baf 100644 --- a/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java +++ b/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java @@ -61,11 +61,11 @@ * * * Schema schema = SchemaBuilder - * .record("HandshakeRequest").namespace("org.apache.avro.ipc) + * .record("HandshakeRequest").namespace("org.apache.avro.ipc") * .fields() * .name("clientHash").type().fixed("MD5").size(16).noDefault() * .name("clientProtocol").type().nullable().stringType().noDefault() - * .name("serverHash").type("MD5") + * .name("serverHash").type("MD5").noDefault() * .name("meta").type().nullable().map().values().bytesType().noDefault() * .endRecord(); * This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > SchemaBuilder Java documentation code snippet is not valid > -- > > Key: AVRO-2142 > URL: https://issues.apache.org/jira/browse/AVRO-2142 > Project: Apache Avro > Issue Type: Improvement > Components: doc, java >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > > The code snippet in SchemaBuilder is invalid, it has invalid quotes and > misses one call in the builder chain. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid
[ https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700226#comment-16700226 ] ASF subversion and git services commented on AVRO-2142: --- Commit 39ec1a3f0addfce06869f705f7a17c03d538fe16 in avro's branch refs/heads/master from [~iemejia] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=39ec1a3 ] AVRO-2142: Fix SchemaBuilder javadoc code snippet > SchemaBuilder Java documentation code snippet is not valid > -- > > Key: AVRO-2142 > URL: https://issues.apache.org/jira/browse/AVRO-2142 > Project: Apache Avro > Issue Type: Improvement > Components: doc, java >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > > The code snippet in SchemaBuilder is invalid, it has invalid quotes and > misses one call in the builder chain. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AVRO-2181) Missing escape character breaks TestIdl.java in windows
[ https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved AVRO-2181. Resolution: Fixed > Missing escape character breaks TestIdl.java in windows > --- > > Key: AVRO-2181 > URL: https://issues.apache.org/jira/browse/AVRO-2181 > Project: Apache Avro > Issue Type: Bug > Components: build, java >Affects Versions: 1.8.2 > Environment: Windows >Reporter: Hans-Peter Werner >Priority: Major > Fix For: 1.9.0 > > > In a call to String.replace() a backslash is missing before "\r", so CRs are > not correctly removed in windows environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2181) Missing escape character breaks TestIdl.java in windows
[ https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated AVRO-2181: --- Summary: Missing escape character breaks TestIdl.java in windows (was: Missing escape charater in TestIdl.java) > Missing escape character breaks TestIdl.java in windows > --- > > Key: AVRO-2181 > URL: https://issues.apache.org/jira/browse/AVRO-2181 > Project: Apache Avro > Issue Type: Bug > Components: build, java >Affects Versions: 1.8.2 > Environment: Windows >Reporter: Hans-Peter Werner >Priority: Major > Fix For: 1.9.0 > > > In a call to String.replace() a backslash is missing before "\r", so CRs are > not correctly removed in windows environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java
[ https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700153#comment-16700153 ] ASF GitHub Bot commented on AVRO-2181: -- iemejia commented on issue #312: AVRO-2181: missing escape character added URL: https://github.com/apache/avro/pull/312#issuecomment-441999198 Oups forgot to thank you for your contribution. Thanks :) ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Missing escape charater in TestIdl.java > --- > > Key: AVRO-2181 > URL: https://issues.apache.org/jira/browse/AVRO-2181 > Project: Apache Avro > Issue Type: Bug > Components: build, java >Affects Versions: 1.8.2 > Environment: Windows >Reporter: Hans-Peter Werner >Priority: Major > Fix For: 1.9.0 > > > In a call to String.replace() a backslash is missing before "\r", so CRs are > not correctly removed in windows environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java
[ https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700151#comment-16700151 ] ASF GitHub Bot commented on AVRO-2181: -- iemejia closed pull request #312: AVRO-2181: missing escape character added URL: https://github.com/apache/avro/pull/312 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java b/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java index b38714410..26e502c1a 100644 --- a/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java +++ b/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java @@ -152,7 +152,7 @@ public String testName() { public void run() throws Exception { String output = generate(); String slurped = slurp(expectedOut); - assertEquals(slurped.trim(), output.replace("\r", "").trim()); + assertEquals(slurped.trim(), output.replace("\\r", "").trim()); } public void write() throws Exception { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Missing escape charater in TestIdl.java > --- > > Key: AVRO-2181 > URL: https://issues.apache.org/jira/browse/AVRO-2181 > Project: Apache Avro > Issue Type: Bug > Components: build, java >Affects Versions: 1.8.2 > Environment: Windows >Reporter: Hans-Peter Werner >Priority: Major > Fix For: 1.9.0 > > > In a call to String.replace() a backslash is missing before "\r", so CRs are > not correctly removed in windows environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java
[ https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700152#comment-16700152 ] ASF subversion and git services commented on AVRO-2181: --- Commit d3c726fce8d5dd9632960939858af134895ff3ea in avro's branch refs/heads/master from [~hp9000] [ https://gitbox.apache.org/repos/asf?p=avro.git;h=d3c726f ] AVRO-2181: missing escape character added > Missing escape charater in TestIdl.java > --- > > Key: AVRO-2181 > URL: https://issues.apache.org/jira/browse/AVRO-2181 > Project: Apache Avro > Issue Type: Bug > Components: build, java >Affects Versions: 1.8.2 > Environment: Windows >Reporter: Hans-Peter Werner >Priority: Major > Fix For: 1.9.0 > > > In a call to String.replace() a backslash is missing before "\r", so CRs are > not correctly removed in windows environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader
[ https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700045#comment-16700045 ] ASF GitHub Bot commented on AVRO-2247: -- unchuckable commented on issue #391: AVRO-2247 - improved java reading performance with new reader URL: https://github.com/apache/avro/pull/391#issuecomment-441965011 Hi, @rstata. First of all, thanks for looking into it. It means a lot. I'm sorry about the license files; totally forgot about them files this time 😞 I pulled your change from your repo and pushed it into mine. No clue what's up with github and the pull request there, if anybody has a pointer on what I would need to set in my repo, any advice is welcome. Invoking the benchmark: `cd lang/java/benchmark` `mvn clean package` `java -jar target/benchmarks.jar` (not the `benchmark-1.9.0-SNAPSHOT`) By default, it will use 5 warmup iterations and 5 measurement iterations with 10 seconds each, and do all of that 5 times, which totals up to almost 3 hours, but it can easily be reduced to more reasonable limits (20 minutes), like: `java -jar target/benchmarks.jar -wi 3 -i 3 -f 1` (3 iterations for warmup and measurement and only 1 repetition) Adding `-e Building` will exclude the buiding of the DatumReaders from the benchmark, and reduce the total time of evaluation by half currently. The current benchmark classes are only a small excerpt of cases of Perf.java (but trying to replicate them as good as possible). I can gladly add more if it helps the project; it might make sense to move that to a different ticket though, I guess. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Java reading performance with a new reader > -- > > Key: AVRO-2247 > URL: https://issues.apache.org/jira/browse/AVRO-2247 > Project: Apache Avro > Issue Type: Improvement > Components: java >Reporter: Martin Jubelgas >Priority: Major > Fix For: 1.9.0 > > Attachments: Perf-Comparison.md > > > Complementary to AVRO-2090, I have been working on decoding of Avro objects > in Java and am suggesting a new implementation of a DatumReader that improves > read performance for both generic and specific records by approximately 20% > (and even more in cases of nested objects with defaults, a case I encounter a > lot in practical use). > Key concept is to create a detailed execution plan once at DatumReader. This > execution plan contains all required defaulting/lookup values so they need > not be looked up during object traversal while reading. > The reader implementation can be enabled and disabled per GenericData > instance. The system default is set via the system variable > "org.apache.avro.fastread" (defaults to "false"). > Attached a performance comparison of the existing implementation with the > proposed one. Will open a pull request with respective code in a bit (not > including interoperability with the optimizations of AVRO-2090 yet). Please > let me know your opinion of whether this is worth pursuing further. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)