[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701425#comment-16701425
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

rstata commented on issue #391: AVRO-2247 - improved java reading performance 
with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-442330623
 
 
   I've run your code against `Perf.java` and uploaded the 
   [results 
here](https://github.com/apache/avro/files/2623075/AVRO-2247-Perf-results-11-27.pdf).
  This report contains two sets of results:
   
   * The "avro-2247 (calibration)" column presents the results of running the 
2247 branch against itself three different times.  These results are useful for 
understanding where the Perf.java benchmark tends to have a lot of internal 
variability.  As an example, the BooleanRead/Write shows a lot of natural 
variability, which is something I've notice in a lot of my previous performance 
testing.
   
   * The "avro-2274 (w/ custom coders) vs" column presents the result of 
running three different treatments against my avro-2274 branch.  The three 
sub-columns here are as follows: "master" is the Apache Avro master branch 
(just prior to avro-2274 being merged into it); "2247 (off)" branch is the 2247 
code with fast-coder turned off; "2247 (on)" is the 2247 branch with coders 
turned on.
   
   The last sub-column of "avro-2274 (...) vs" results is the more relevant.  
What we see here are a large number of record-related cases showing speedups of 
20-30% and even more.  This is very promising.
   
   I am currently running the JMH-based benchmarks.  These do _not_ have an 
(obvious) mechanism for comparing the "before/after" performance of your 
proposed changes, but I will be interested in seeing if they do better in 
reducing the variance between runs.
   
   I haven't inspected your code yet.  I'll do that as well, and offer some 
opinions.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AVRO-2274) Improve resolving performance when schemas don't change

2018-11-27 Thread Thiruvalluvan M. G. (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. resolved AVRO-2274.
---
Resolution: Fixed

Merged the PR. Thank you [~raymie].

> Improve resolving performance when schemas don't change
> ---
>
> Key: AVRO-2274
> URL: https://issues.apache.org/jira/browse/AVRO-2274
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change 
> very much.  We add special-case paths to optimize the case where a 
> _sub_schema of the reader and the writer are the same.  The specific cases 
> are:
> * In the case of an enumeration, if the reader and writer are the same, then 
> we can simply return the tag written by the writer rather than "adjust" it as 
> if it might have been re-ordered.  In fact, we can do this (directly return 
> the tag written by the writer) as long as the reader-schema is an "extension" 
> of the writer's in that it may have added new symbols but hasn't renumbered 
> any of the writer's symbols.  Enumerations that either don't change at all or 
> are "extended" as defined here are the common ways to extend enumerations.  
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is 
> expensive: we have an outer union preceded by a "writer-union action", but 
> each branch of this outer union consist of union-adjust actions, which are 
> heavy weight.  We optimize this case when the reader and writer unions are 
> the same: we fall back on the standard grammar used for a union, avoiding all 
> these adjustments.  Since unions are commonly used to encode "nullable" 
> fields in Avro, and nullability rarely changes as a schema evolves, this 
> optimization should help many users.  (Our tests show this optimization 
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a 
> loop that uses a switch statement to deal with writers that may have 
> re-ordered fields.  In most cases, however, fields have not been reordered 
> (esp. in more complex records with many record sub-schemas).  So we've added 
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a 
> variant of the existing readFieldOrder.  If the field order has indeed 
> changed, then readFieldOrderIfDiff returns the new field order, just like 
> readFieldOrder does.  However, if the field-order hasn't changed, then 
> readFieldOrderIfDiff returns null.  We then modified the generation of 
> custom-decoders for records to add a special-case path that simply reads the 
> record's fields in order, without incurring the overhead of the loop or the 
> switch statement.  (Our tests show this optimization improves performance by 
> 8-9%, on top of the 35-40% produced by the original custom-coder 
> optimization.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701361#comment-16701361
 ] 

ASF subversion and git services commented on AVRO-2274:
---

Commit 6eb25603b96169bf8d77269176218c63c181e9f4 in avro's branch 
refs/heads/master from [~raymie]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=6eb2560 ]

AVRO-2274 Improve resolving performance when schemas don't change. (#393)

* AVRO-2274 Improve resolving performance when schemas don't change.

* AVRO-2274 Break out of field-no-reorder loop as early as possible.


> Improve resolving performance when schemas don't change
> ---
>
> Key: AVRO-2274
> URL: https://issues.apache.org/jira/browse/AVRO-2274
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change 
> very much.  We add special-case paths to optimize the case where a 
> _sub_schema of the reader and the writer are the same.  The specific cases 
> are:
> * In the case of an enumeration, if the reader and writer are the same, then 
> we can simply return the tag written by the writer rather than "adjust" it as 
> if it might have been re-ordered.  In fact, we can do this (directly return 
> the tag written by the writer) as long as the reader-schema is an "extension" 
> of the writer's in that it may have added new symbols but hasn't renumbered 
> any of the writer's symbols.  Enumerations that either don't change at all or 
> are "extended" as defined here are the common ways to extend enumerations.  
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is 
> expensive: we have an outer union preceded by a "writer-union action", but 
> each branch of this outer union consist of union-adjust actions, which are 
> heavy weight.  We optimize this case when the reader and writer unions are 
> the same: we fall back on the standard grammar used for a union, avoiding all 
> these adjustments.  Since unions are commonly used to encode "nullable" 
> fields in Avro, and nullability rarely changes as a schema evolves, this 
> optimization should help many users.  (Our tests show this optimization 
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a 
> loop that uses a switch statement to deal with writers that may have 
> re-ordered fields.  In most cases, however, fields have not been reordered 
> (esp. in more complex records with many record sub-schemas).  So we've added 
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a 
> variant of the existing readFieldOrder.  If the field order has indeed 
> changed, then readFieldOrderIfDiff returns the new field order, just like 
> readFieldOrder does.  However, if the field-order hasn't changed, then 
> readFieldOrderIfDiff returns null.  We then modified the generation of 
> custom-decoders for records to add a special-case path that simply reads the 
> record's fields in order, without incurring the overhead of the loop or the 
> switch statement.  (Our tests show this optimization improves performance by 
> 8-9%, on top of the 35-40% produced by the original custom-coder 
> optimization.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701363#comment-16701363
 ] 

ASF subversion and git services commented on AVRO-2274:
---

Commit 6eb25603b96169bf8d77269176218c63c181e9f4 in avro's branch 
refs/heads/master from [~raymie]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=6eb2560 ]

AVRO-2274 Improve resolving performance when schemas don't change. (#393)

* AVRO-2274 Improve resolving performance when schemas don't change.

* AVRO-2274 Break out of field-no-reorder loop as early as possible.


> Improve resolving performance when schemas don't change
> ---
>
> Key: AVRO-2274
> URL: https://issues.apache.org/jira/browse/AVRO-2274
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> Decoding optimizations based on the observation that schemas don't change 
> very much.  We add special-case paths to optimize the case where a 
> _sub_schema of the reader and the writer are the same.  The specific cases 
> are:
> * In the case of an enumeration, if the reader and writer are the same, then 
> we can simply return the tag written by the writer rather than "adjust" it as 
> if it might have been re-ordered.  In fact, we can do this (directly return 
> the tag written by the writer) as long as the reader-schema is an "extension" 
> of the writer's in that it may have added new symbols but hasn't renumbered 
> any of the writer's symbols.  Enumerations that either don't change at all or 
> are "extended" as defined here are the common ways to extend enumerations.  
> (Our tests show this optimization improves performance by about 3%.)
> * When the reader and writer subschemas are both unions, resolution is 
> expensive: we have an outer union preceded by a "writer-union action", but 
> each branch of this outer union consist of union-adjust actions, which are 
> heavy weight.  We optimize this case when the reader and writer unions are 
> the same: we fall back on the standard grammar used for a union, avoiding all 
> these adjustments.  Since unions are commonly used to encode "nullable" 
> fields in Avro, and nullability rarely changes as a schema evolves, this 
> optimization should help many users.  (Our tests show this optimization 
> improves performance by 25-30%, a significant win.)
> * The "custom code" generated for reading records has to read fields in a 
> loop that uses a switch statement to deal with writers that may have 
> re-ordered fields.  In most cases, however, fields have not been reordered 
> (esp. in more complex records with many record sub-schemas).  So we've added 
> a new method to ResolvingDecoder called readFieldOrderIfDiff, which is a 
> variant of the existing readFieldOrder.  If the field order has indeed 
> changed, then readFieldOrderIfDiff returns the new field order, just like 
> readFieldOrder does.  However, if the field-order hasn't changed, then 
> readFieldOrderIfDiff returns null.  We then modified the generation of 
> custom-decoders for records to add a special-case path that simply reads the 
> record's fields in order, without incurring the overhead of the loop or the 
> switch statement.  (Our tests show this optimization improves performance by 
> 8-9%, on top of the 35-40% produced by the original custom-coder 
> optimization.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2274) Improve resolving performance when schemas don't change

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701360#comment-16701360
 ] 

ASF GitHub Bot commented on AVRO-2274:
--

thiru-apache closed pull request #393: AVRO-2274 Improve resolving performance 
when schemas don't change.
URL: https://github.com/apache/avro/pull/393
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java 
b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
index 8f1f6a95b..45ff922fd 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/io/ResolvingDecoder.java
@@ -129,6 +129,19 @@ public static Object resolve(Schema writer, Schema reader)
   fields;
   }
 
+  /**
+   * Same as {@link readFieldOrder} except that it returns
+   * null if there was no reordering of fields, i.e., if the
+   * correct thing for the reader to do is to read (all) of its fields
+   * in the order specified by its own schema (useful for
+   * optimizations).
+   */
+  public final Schema.Field[] readFieldOrderIfDiff() throws IOException {
+Symbol.FieldOrderAction top
+  = (Symbol.FieldOrderAction) parser.advance(Symbol.FIELD_ACTION);
+return (top.noReorder ? null : top.fields);
+  }
+
   /**
* Consume any more data that has been written by the writer but not
* needed by the reader so that the the underlying decoder is in proper
@@ -252,6 +265,7 @@ public int readEnum() throws IOException {
 parser.advance(Symbol.ENUM);
 Symbol.EnumAdjustAction top = (Symbol.EnumAdjustAction) parser.popSymbol();
 int n = in.readEnum();
+if (top.noAdjustments) return n;
 Object o = top.adjustments[n];
 if (o instanceof Integer) {
   return ((Integer) o).intValue();
@@ -263,9 +277,17 @@ public int readEnum() throws IOException {
   @Override
   public int readIndex() throws IOException {
 parser.advance(Symbol.UNION);
-Symbol.UnionAdjustAction top = (Symbol.UnionAdjustAction) 
parser.popSymbol();
-parser.pushSymbol(top.symToParse);
-return top.rindex;
+Symbol top = parser.popSymbol();
+int result;
+if (top instanceof Symbol.UnionAdjustAction) {
+  result = ((Symbol.UnionAdjustAction) top).rindex;
+  top = ((Symbol.UnionAdjustAction) top).symToParse;
+} else {
+  result = in.readIndex();
+  top = ((Symbol.Alternative) top).getSymbol(result);
+}
+parser.pushSymbol(top);
+return result;
   }
 
   @Override
diff --git 
a/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java
 
b/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java
index 71978824b..61073dce8 100644
--- 
a/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java
+++ 
b/lang/java/avro/src/main/java/org/apache/avro/io/parsing/ResolvingGrammarGenerator.java
@@ -76,8 +76,8 @@ public final Symbol generate(Schema writer, Schema reader)
* @return  The start symbol for the resolving grammar
* @throws IOException
*/
-  public Symbol generate(Schema writer, Schema reader,
-Map seen) throws IOException
+  private Symbol generate(Schema writer, Schema reader, Map seen)
+throws IOException
   {
 final Schema.Type writerType = writer.getType();
 final Schema.Type readerType = reader.getType();
@@ -204,6 +204,9 @@ public Symbol generate(Schema writer, Schema reader,
 
   private Symbol resolveUnion(Schema writer, Schema reader,
   Map seen) throws IOException {
+boolean needsAdj = ! unionEquiv(writer, reader, new HashMap<>());
+List alts2 = (!needsAdj ? reader.getTypes() : null);
+
 List alts = writer.getTypes();
 final int size = alts.size();
 Symbol[] symbols = new Symbol[size];
@@ -215,12 +218,72 @@ private Symbol resolveUnion(Schema writer, Schema reader,
  */
 int i = 0;
 for (Schema w : alts) {
-  symbols[i] = generate(w, reader, seen);
+  symbols[i] = generate(w, (needsAdj ? reader : alts2.get(i)), seen);
   labels[i] = w.getFullName();
   i++;
 }
+if (! needsAdj)
+  return Symbol.seq(Symbol.alt(symbols, labels), Symbol.UNION);
 return Symbol.seq(Symbol.alt(symbols, labels),
-  Symbol.writerUnionAction());
+  Symbol.WRITER_UNION_ACTION);
+  }
+
+  private static boolean unionEquiv(Schema w, Schema r, Map 
seen) {
+Schema.Type wt = w.getType();
+if (wt != r.getType()) return false;
+if ((wt == Schema.Type.RECORD || wt == Schema.Type.FIXED || wt == 
Schema.Type.ENUM)
+&& ! 

[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701271#comment-16701271
 ] 

ASF GitHub Bot commented on AVRO-2276:
--

thiru-apache commented on a change in pull request #394: AVRO-2276: Escape Map 
keys in GenericData.toString to generate valid JSON
URL: https://github.com/apache/avro/pull/394#discussion_r236911408
 
 

 ##
 File path: lang/java/grpc/pom.xml
 ##
 @@ -21,8 +21,8 @@
   4.0.0
 
   
-org.apache.avro
 avro-parent
+org.apache.avro
 1.9.0-SNAPSHOT
 
 Review comment:
   The convention is to have `` before ``. Has some tool 
changed the order here accidentally?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GenericData.toString does not always generate valid JSON for Map datum
> --
>
> Key: AVRO-2276
> URL: https://issues.apache.org/jira/browse/AVRO-2276
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Avro represents data as json internally so it requires to escape the keys of 
> the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259
> I discover this while running a build on windows because of '\' characters. 
> But it can be easily reproduced on linux creating a file/dir with backspaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701272#comment-16701272
 ] 

ASF GitHub Bot commented on AVRO-2276:
--

thiru-apache commented on a change in pull request #394: AVRO-2276: Escape Map 
keys in GenericData.toString to generate valid JSON
URL: https://github.com/apache/avro/pull/394#discussion_r236912579
 
 

 ##
 File path: lang/java/grpc/pom.xml
 ##
 @@ -87,6 +92,12 @@
   
   test
 
+
+  io.netty
+  netty-codec-http2
+  ${netty-codec-http2.version}
+  test
+
 
 Review comment:
   The change does not seem to do anything with grpc. Did you patch this pom 
and the main pom by mistake?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GenericData.toString does not always generate valid JSON for Map datum
> --
>
> Key: AVRO-2276
> URL: https://issues.apache.org/jira/browse/AVRO-2276
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Avro represents data as json internally so it requires to escape the keys of 
> the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259
> I discover this while running a build on windows because of '\' characters. 
> But it can be easily reproduced on linux creating a file/dir with backspaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2278) GenericData.Record field getter not correct

2018-11-27 Thread Zoltan Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-2278:

Summary: GenericData.Record field getter not correct  (was: 
GenericData.Record field getter no correct)

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2278) GenericData.Record field getter no correct

2018-11-27 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2278:
---

 Summary: GenericData.Record field getter no correct
 Key: AVRO-2278
 URL: https://issues.apache.org/jira/browse/AVRO-2278
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.8.2
Reporter: Zoltan Farkas


Currently the get field implementation is not correct in GenericData.Record:

at: 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209

{code}
   @Override public Object get(String key) {
  Field field = schema.getField(key);
  if (field == null) return null;
  return values[field.pos()];
}
{code}

The method returns null when a field is not present, making it impossible to 
distinguish between:

field value = null

and

field does not exist.

A more "correct" implementation would be:

{code}
@Override public Object get(String key) {
  Field field = schema.getField(key);
  if (field == null) {
throw new IllegalArgumentException("Invalid field " + key);
  }
  return values[field.pos()];
}
{code}

this will make the behavior consistent with put which will throw a exception 
when setting a non existent field.

when I make this change in my fork, some bugs in unit tests showed up




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum

2018-11-27 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated AVRO-2276:
---
Affects Version/s: (was: 1.8.2)
   1.9.0

> GenericData.toString does not always generate valid JSON for Map datum
> --
>
> Key: AVRO-2276
> URL: https://issues.apache.org/jira/browse/AVRO-2276
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Avro represents data as json internally so it requires to escape the keys of 
> the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259
> I discover this while running a build on windows because of '\' characters. 
> But it can be easily reproduced on linux creating a file/dir with backspaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2277) clean up Ruby warnings

2018-11-27 Thread Tim Perkins (JIRA)
Tim Perkins created AVRO-2277:
-

 Summary: clean up Ruby warnings
 Key: AVRO-2277
 URL: https://issues.apache.org/jira/browse/AVRO-2277
 Project: Apache Avro
  Issue Type: Improvement
  Components: ruby
Reporter: Tim Perkins
Assignee: Tim Perkins
 Fix For: 1.9.0


Running tests for the Ruby implementation generates a lot of warnings and makes 
it unclear that the Ruby tests are passing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2277) clean up Ruby warnings

2018-11-27 Thread Tim Perkins (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Perkins updated AVRO-2277:
--
Status: Patch Available  (was: Open)

https://github.com/apache/avro/pull/392

> clean up Ruby warnings
> --
>
> Key: AVRO-2277
> URL: https://issues.apache.org/jira/browse/AVRO-2277
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: ruby
>Reporter: Tim Perkins
>Assignee: Tim Perkins
>Priority: Minor
> Fix For: 1.9.0
>
>
> Running tests for the Ruby implementation generates a lot of warnings and 
> makes it unclear that the Ruby tests are passing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700734#comment-16700734
 ] 

ASF GitHub Bot commented on AVRO-2276:
--

iemejia opened a new pull request #394: AVRO-2276: Escape Map keys in 
GenericData.toString to generate valid JSON
URL: https://github.com/apache/avro/pull/394
 
 
   The extra changes in the pom file is to fix for a maven RAT run on windows.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> GenericData.toString does not always generate valid JSON for Map datum
> --
>
> Key: AVRO-2276
> URL: https://issues.apache.org/jira/browse/AVRO-2276
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Avro represents data as json internally so it requires to escape the keys of 
> the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259
> I discover this while running a build on windows because of '\' characters. 
> But it can be easily reproduced on linux creating a file/dir with backspaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2276) GenericData.toString does not always generate valid JSON for Map datum

2018-11-27 Thread JIRA
Ismaël Mejía created AVRO-2276:
--

 Summary: GenericData.toString does not always generate valid JSON 
for Map datum
 Key: AVRO-2276
 URL: https://issues.apache.org/jira/browse/AVRO-2276
 Project: Apache Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.8.2
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


Avro represents data as json internally so it requires to escape the keys of 
the objects (Maps) as mandated by https://tools.ietf.org/html/rfc8259

I discover this while running a build on windows because of '\' characters. But 
it can be easily reproduced on linux creating a file/dir with backspaces.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2273) Release 1.8.3

2018-11-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/AVRO-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700244#comment-16700244
 ] 

Ismaël Mejía commented on AVRO-2273:


What are the goals here? Apart of minor fixes? The security issues have not 
been backported yet. Maybe we should just encourage people to move to 1.9.0 
better, no?
Otherwise probably we should do the list of issues/PRs to backport but seems 
like a lot of extra work vs jumping straight to 1.9.x.

> Release 1.8.3
> -
>
> Key: AVRO-2273
> URL: https://issues.apache.org/jira/browse/AVRO-2273
> Project: Apache Avro
>  Issue Type: Task
>Reporter: Thiruvalluvan M. G.
>Priority: Major
> Fix For: 1.8.3
>
>
> This ticket is for releasing Avro 1.8.3 and discussing any topics related to 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid

2018-11-27 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated AVRO-2142:
---
   Resolution: Fixed
Fix Version/s: 1.9.0
   Status: Resolved  (was: Patch Available)

> SchemaBuilder Java documentation code snippet is not valid
> --
>
> Key: AVRO-2142
> URL: https://issues.apache.org/jira/browse/AVRO-2142
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: doc, java
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 1.9.0
>
>
> The code snippet in SchemaBuilder is invalid, it has invalid quotes and 
> misses one call in the builder chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700224#comment-16700224
 ] 

ASF GitHub Bot commented on AVRO-2142:
--

iemejia closed pull request #282: AVRO-2142: Fix SchemaBuilder javadoc code 
snippet
URL: https://github.com/apache/avro/pull/282
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java 
b/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java
index cdc43e032..8ebe45baf 100644
--- a/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java
+++ b/lang/java/avro/src/main/java/org/apache/avro/SchemaBuilder.java
@@ -61,11 +61,11 @@
  *
  * 
  *   Schema schema = SchemaBuilder
- *   .record("HandshakeRequest").namespace("org.apache.avro.ipc)
+ *   .record("HandshakeRequest").namespace("org.apache.avro.ipc")
  *   .fields()
  * .name("clientHash").type().fixed("MD5").size(16).noDefault()
  * .name("clientProtocol").type().nullable().stringType().noDefault()
- * .name("serverHash").type("MD5")
+ * .name("serverHash").type("MD5").noDefault()
  * .name("meta").type().nullable().map().values().bytesType().noDefault()
  *   .endRecord();
  * 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> SchemaBuilder Java documentation code snippet is not valid
> --
>
> Key: AVRO-2142
> URL: https://issues.apache.org/jira/browse/AVRO-2142
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: doc, java
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> The code snippet in SchemaBuilder is invalid, it has invalid quotes and 
> misses one call in the builder chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2142) SchemaBuilder Java documentation code snippet is not valid

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700226#comment-16700226
 ] 

ASF subversion and git services commented on AVRO-2142:
---

Commit 39ec1a3f0addfce06869f705f7a17c03d538fe16 in avro's branch 
refs/heads/master from [~iemejia]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=39ec1a3 ]

AVRO-2142: Fix SchemaBuilder javadoc code snippet


> SchemaBuilder Java documentation code snippet is not valid
> --
>
> Key: AVRO-2142
> URL: https://issues.apache.org/jira/browse/AVRO-2142
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: doc, java
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
>
> The code snippet in SchemaBuilder is invalid, it has invalid quotes and 
> misses one call in the builder chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AVRO-2181) Missing escape character breaks TestIdl.java in windows

2018-11-27 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved AVRO-2181.

Resolution: Fixed

> Missing escape character breaks TestIdl.java in windows
> ---
>
> Key: AVRO-2181
> URL: https://issues.apache.org/jira/browse/AVRO-2181
> Project: Apache Avro
>  Issue Type: Bug
>  Components: build, java
>Affects Versions: 1.8.2
> Environment: Windows
>Reporter: Hans-Peter Werner
>Priority: Major
> Fix For: 1.9.0
>
>
> In a call to String.replace() a backslash is missing before "\r", so CRs are 
> not correctly removed in windows environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2181) Missing escape character breaks TestIdl.java in windows

2018-11-27 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated AVRO-2181:
---
Summary: Missing escape character breaks TestIdl.java in windows  (was: 
Missing escape charater in TestIdl.java)

> Missing escape character breaks TestIdl.java in windows
> ---
>
> Key: AVRO-2181
> URL: https://issues.apache.org/jira/browse/AVRO-2181
> Project: Apache Avro
>  Issue Type: Bug
>  Components: build, java
>Affects Versions: 1.8.2
> Environment: Windows
>Reporter: Hans-Peter Werner
>Priority: Major
> Fix For: 1.9.0
>
>
> In a call to String.replace() a backslash is missing before "\r", so CRs are 
> not correctly removed in windows environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700153#comment-16700153
 ] 

ASF GitHub Bot commented on AVRO-2181:
--

iemejia commented on issue #312: AVRO-2181: missing escape character added
URL: https://github.com/apache/avro/pull/312#issuecomment-441999198
 
 
   Oups forgot to thank you for your contribution. Thanks :) !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Missing escape charater in TestIdl.java
> ---
>
> Key: AVRO-2181
> URL: https://issues.apache.org/jira/browse/AVRO-2181
> Project: Apache Avro
>  Issue Type: Bug
>  Components: build, java
>Affects Versions: 1.8.2
> Environment: Windows
>Reporter: Hans-Peter Werner
>Priority: Major
> Fix For: 1.9.0
>
>
> In a call to String.replace() a backslash is missing before "\r", so CRs are 
> not correctly removed in windows environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700151#comment-16700151
 ] 

ASF GitHub Bot commented on AVRO-2181:
--

iemejia closed pull request #312: AVRO-2181: missing escape character added
URL: https://github.com/apache/avro/pull/312
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java 
b/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java
index b38714410..26e502c1a 100644
--- a/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java
+++ b/lang/java/compiler/src/test/java/org/apache/avro/compiler/idl/TestIdl.java
@@ -152,7 +152,7 @@ public String testName() {
 public void run() throws Exception {
   String output = generate();
   String slurped = slurp(expectedOut);
-  assertEquals(slurped.trim(), output.replace("\r", "").trim());
+  assertEquals(slurped.trim(), output.replace("\\r", "").trim());
 }
 
 public void write() throws Exception {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Missing escape charater in TestIdl.java
> ---
>
> Key: AVRO-2181
> URL: https://issues.apache.org/jira/browse/AVRO-2181
> Project: Apache Avro
>  Issue Type: Bug
>  Components: build, java
>Affects Versions: 1.8.2
> Environment: Windows
>Reporter: Hans-Peter Werner
>Priority: Major
> Fix For: 1.9.0
>
>
> In a call to String.replace() a backslash is missing before "\r", so CRs are 
> not correctly removed in windows environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2181) Missing escape charater in TestIdl.java

2018-11-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700152#comment-16700152
 ] 

ASF subversion and git services commented on AVRO-2181:
---

Commit d3c726fce8d5dd9632960939858af134895ff3ea in avro's branch 
refs/heads/master from [~hp9000]
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=d3c726f ]

AVRO-2181: missing escape character added


> Missing escape charater in TestIdl.java
> ---
>
> Key: AVRO-2181
> URL: https://issues.apache.org/jira/browse/AVRO-2181
> Project: Apache Avro
>  Issue Type: Bug
>  Components: build, java
>Affects Versions: 1.8.2
> Environment: Windows
>Reporter: Hans-Peter Werner
>Priority: Major
> Fix For: 1.9.0
>
>
> In a call to String.replace() a backslash is missing before "\r", so CRs are 
> not correctly removed in windows environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2247) Improve Java reading performance with a new reader

2018-11-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700045#comment-16700045
 ] 

ASF GitHub Bot commented on AVRO-2247:
--

unchuckable commented on issue #391: AVRO-2247 - improved java reading 
performance with new reader
URL: https://github.com/apache/avro/pull/391#issuecomment-441965011
 
 
   Hi, @rstata.
   
   First of all, thanks for looking into it. It means a lot. I'm sorry about 
the license files; totally forgot about them files this time  
   
   I pulled your change from your repo and pushed it into mine. No clue what's 
up with github and the pull request there, if anybody has a pointer on what I 
would need to set in my repo, any advice is welcome.
   
   Invoking the benchmark:
   `cd lang/java/benchmark`
   `mvn clean package`
   `java -jar target/benchmarks.jar` (not the `benchmark-1.9.0-SNAPSHOT`)
   
   By default, it will use 5 warmup iterations and 5 measurement iterations 
with 10 seconds each, and do all of that 5 times, which totals up to almost 3 
hours, but it can easily be reduced to more reasonable limits (20 minutes), 
like:
   `java -jar target/benchmarks.jar -wi 3 -i 3 -f 1` (3 iterations for warmup 
and measurement and only 1 repetition)
   Adding `-e Building` will exclude the buiding of the DatumReaders from the 
benchmark, and reduce  the total time of evaluation by half currently.
   
   The current benchmark classes are only a small excerpt of cases of Perf.java 
(but trying to replicate them as good as possible). I can gladly add more if it 
helps the project; it might make sense to move that to a different ticket 
though, I guess.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve Java reading performance with a new reader
> --
>
> Key: AVRO-2247
> URL: https://issues.apache.org/jira/browse/AVRO-2247
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Martin Jubelgas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: Perf-Comparison.md
>
>
> Complementary to AVRO-2090, I have been working on decoding of Avro objects 
> in Java and am suggesting a new implementation of a DatumReader that improves 
> read performance for both generic and specific records by approximately 20% 
> (and even more in cases of nested objects with defaults, a case I encounter a 
> lot in practical use).
> Key concept is to create a detailed execution plan once at DatumReader. This 
> execution plan contains all required defaulting/lookup values so they need 
> not be looked up during object traversal while reading.
> The reader implementation can be enabled and disabled per GenericData 
> instance. The system default is set via the system variable 
> "org.apache.avro.fastread" (defaults to "false").
> Attached a performance comparison of the existing implementation with the 
> proposed one. Will open a pull request with respective code in a bit (not 
> including interoperability with the optimizations of AVRO-2090 yet). Please 
> let me know your opinion of whether this is worth pursuing further.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)