[jira] [Commented] (AVRO-1835) Running tests using JDK 1.8 complains about MaxPermSize

2016-04-28 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262386#comment-15262386
 ] 

Ryan Blue commented on AVRO-1835:
-

+1. The patch works for me to get rid of those warnings.

> Running tests using JDK 1.8 complains about MaxPermSize
> ---
>
> Key: AVRO-1835
> URL: https://issues.apache.org/jira/browse/AVRO-1835
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
> Fix For: 1.8.1
>
> Attachments: AVRO-1835-2016-04-25.patch, AVRO-1835-2016-04-27.patch
>
>
> When building AVRO under JDK 1.8 (as I assume most of us do) the output  
> contains the line {code}OpenJDK 64-Bit Server VM warning: ignoring option 
> MaxPermSize=200m; support was removed in 8.0{code}for every test class that 
> is run.
> The the output becomes cluttered like this:
> {code}
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestEncoders
> Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.467 sec - 
> in org.apache.avro.io.TestEncoders
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestBlockingIO2
> Tests run: 84, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.157 sec - 
> in org.apache.avro.io.TestBlockingIO2
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestBlockingIO
> Tests run: 376, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.347 sec - 
> in org.apache.avro.io.TestBlockingIO
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.parsing.TestResolvingGrammarGenerator
> Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.431 sec - 
> in org.apache.avro.io.parsing.TestResolvingGrammarGenerator
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.parsing.TestResolvingGrammarGenerator2
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.341 sec - 
> in org.apache.avro.io.parsing.TestResolvingGrammarGenerator2
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestResolvingIOResolving
> Tests run: 192, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.575 sec - 
> in org.apache.avro.io.TestResolvingIOResolving
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1814) 1.8 IDL generator broken when containing a field called 'org'

2016-04-28 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262375#comment-15262375
 ] 

Ryan Blue commented on AVRO-1814:
-

+1. Thanks for fixing this!

> 1.8 IDL generator broken when containing a field called 'org'
> -
>
> Key: AVRO-1814
> URL: https://issues.apache.org/jira/browse/AVRO-1814
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Dustin Spicuzza
>Assignee: Niels Basjes
> Attachments: AVRO-1814-20160410.patch, AVRO-1814-20160428.patch
>
>
> The problem is in the generated 'readExternal' and 'writeExternal' functions, 
> because they do something like:
> WRITER$.write(this, org.apache.avro.specific.SpecificData.getEncoder(out));
> When a member variable called 'org' exists, then the compile fails because 
> the compiler thinks that 'org' is a member variable and that 'apache cannot 
> be resolved or is not a field'. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1213) Dependency on Jetty Servlet API in IPC

2016-04-27 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260631#comment-15260631
 ] 

Ryan Blue commented on AVRO-1213:
-

[~b...@benmccann.com], that sounds fine to me. What is the benefit of having 
both? Should we update the Netty version?

> Dependency on Jetty Servlet API in IPC
> --
>
> Key: AVRO-1213
> URL: https://issues.apache.org/jira/browse/AVRO-1213
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.2
>Reporter: Sharmarke Aden
>Priority: Minor
>
> The compile scoped dependency on jetty servlet-api in the IPC pom file can be 
> problematic if using Avro in a webapp environment. Would it be possible to 
> make this dependency either optional or provided? Or maybe Avro modularize 
> into sub-modules in such a way that desired features can be assembled 
> piecemeal?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1814) 1.8 IDL generator broken when containing a field called 'org'

2016-04-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256537#comment-15256537
 ] 

Ryan Blue commented on AVRO-1814:
-

Looks mostly good to me. But why change the test that was using the namespace 
"org.apache..."?

> 1.8 IDL generator broken when containing a field called 'org'
> -
>
> Key: AVRO-1814
> URL: https://issues.apache.org/jira/browse/AVRO-1814
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Dustin Spicuzza
>Assignee: Niels Basjes
> Attachments: AVRO-1814-20160410.patch
>
>
> The problem is in the generated 'readExternal' and 'writeExternal' functions, 
> because they do something like:
> WRITER$.write(this, org.apache.avro.specific.SpecificData.getEncoder(out));
> When a member variable called 'org' exists, then the compile fails because 
> the compiler thinks that 'org' is a member variable and that 'apache cannot 
> be resolved or is not a field'. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1835) Running tests using JDK 1.8 complains about MaxPermSize

2016-04-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256529#comment-15256529
 ] 

Ryan Blue commented on AVRO-1835:
-

We still support Java 7, so it makes sense to keep the setting since it only 
causes a warning. We can probably avoid it by adding a Java 7 profile and 
adding the option only when the profile is active.

> Running tests using JDK 1.8 complains about MaxPermSize
> ---
>
> Key: AVRO-1835
> URL: https://issues.apache.org/jira/browse/AVRO-1835
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
> Fix For: 1.8.1
>
> Attachments: AVRO-1835-2016-04-25.patch
>
>
> When building AVRO under JDK 1.8 (as I assume most of us do) the output  
> contains the line {code}OpenJDK 64-Bit Server VM warning: ignoring option 
> MaxPermSize=200m; support was removed in 8.0{code}for every test class that 
> is run.
> The the output becomes cluttered like this:
> {code}
> ---
>  T E S T S
> ---
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestEncoders
> Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.467 sec - 
> in org.apache.avro.io.TestEncoders
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestBlockingIO2
> Tests run: 84, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.157 sec - 
> in org.apache.avro.io.TestBlockingIO2
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestBlockingIO
> Tests run: 376, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.347 sec - 
> in org.apache.avro.io.TestBlockingIO
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.parsing.TestResolvingGrammarGenerator
> Tests run: 32, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.431 sec - 
> in org.apache.avro.io.parsing.TestResolvingGrammarGenerator
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.parsing.TestResolvingGrammarGenerator2
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.341 sec - 
> in org.apache.avro.io.parsing.TestResolvingGrammarGenerator2
> OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=200m; support 
> was removed in 8.0
> Running org.apache.avro.io.TestResolvingIOResolving
> Tests run: 192, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.575 sec - 
> in org.apache.avro.io.TestResolvingIOResolving
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1834) Lower the Javadoc warnings on the generated code.

2016-04-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256523#comment-15256523
 ] 

Ryan Blue commented on AVRO-1834:
-

+1

> Lower the Javadoc warnings on the generated code.
> -
>
> Key: AVRO-1834
> URL: https://issues.apache.org/jira/browse/AVRO-1834
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Niels Basjes
>Assignee: Niels Basjes
> Fix For: 1.8.1
>
> Attachments: AVRO-1834-2016-04-25.patch
>
>
> I see a LOT of JavaDoc related warnings on the generated code in Java.
> They are all about things like {{warning: no @param for}} and {{missing: 
> @return}}.
> In my work project this results in hundreds of warnings so they obfuscate the 
> things that do need attention.
> As these are generated I expect the required changes to be minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1705) Set up Jenkins job to test all languages using Docker

2016-04-22 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254146#comment-15254146
 ] 

Ryan Blue commented on AVRO-1705:
-

Sounds reasonable to me.

> Set up Jenkins job to test all languages using Docker
> -
>
> Key: AVRO-1705
> URL: https://issues.apache.org/jira/browse/AVRO-1705
> Project: Avro
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.7.7
>Reporter: Tom White
>Priority: Critical
>  Labels: starter
>
> The ASF Jenkins instance now supports Docker (BUILDS-25), so we could run all 
> the tests (for all languages that Avro supports) using the Avro Dockerfile. 
> We might also do a nightly build of the whole distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1705) Set up Jenkins job to test all languages using Docker

2016-04-22 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254119#comment-15254119
 ] 

Ryan Blue commented on AVRO-1705:
-

Is this the right approach to CI?

I think we should consider per-implementation runs. While it's great to be able 
to do checks on all of the languages easily and at one time, we don't need to 
waste time and resources building and testing all languages when there's a 
change to just one.

At the same time, we should be doing more thorough testing for some 
implementations, like Ruby. We ran into issues last release where some Ruby 
versions had test failures, but Ruby has tooling built around testing packages 
in multiple versions we could be using (and arguably should be).

What do you guys think about having multiple profiles to do better testing for 
each implementation?

> Set up Jenkins job to test all languages using Docker
> -
>
> Key: AVRO-1705
> URL: https://issues.apache.org/jira/browse/AVRO-1705
> Project: Avro
>  Issue Type: Task
>  Components: build
>Affects Versions: 1.7.7
>Reporter: Tom White
>Priority: Critical
>  Labels: starter
>
> The ASF Jenkins instance now supports Docker (BUILDS-25), so we could run all 
> the tests (for all languages that Avro supports) using the Avro Dockerfile. 
> We might also do a nightly build of the whole distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1807) NullPointerException from Json.ObjectWriter

2016-04-21 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1807:

Description: 
/* I posted this bug already to the dev-mailinglist [4]. Reporting it here 
again only to make sure it doesn't get lost and because this is the right 
place. */

The complete, slightly more involved code is on [1], especially [2], the JSON 
schema is at [3], but the relevant parts of the code follow below.

{code}
// parsing the schema
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(new File("schema.avsc")) ;
// setting up the encoder and driver
Json.ObjectWriter jsonDatumWriter = new Json.ObjectWriter();
OutputStream output = new FileOutputStream(new File("output.json"));
Encoder encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
// writing
System.out.println(payload); // reassuring test the payload is intact
jsonDatumWriter.write(payload, encoder);
{code}

The console will print out a nice JSON string (the payload), followed by the 
following exception:
{code}
Exception in thread "main" java.lang.NullPointerException
at org.apache.avro.data.Json.write(Json.java:183)
at org.apache.avro.data.Json.writeObject(Json.java:272)
at org.apache.avro.data.Json.access$000(Json.java:48)
at org.apache.avro.data.Json$ObjectWriter.write(Json.java:122)
at converTor.WriterObject.append(WriterObject.java:59)
at converTor.ConverTor.main(ConverTor.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
{code}

My beginners guess is that the source of the problem is a call of asToken() in 
org.codehaus.jackson.JsonNode, which is abstract.

[0] 
https://issues.apache.org/jira/browse/avro/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
[1] https://github.com/tomlurge/converTor
[2] 
https://github.com/tomlurge/converTor/blob/master/src/converTor/WriterObject.java
[3] 
https://github.com/tomlurge/converTor/blob/master/src/converTor/avro/schemata/Torperf.avsc
[4] 
https://mail-archives.apache.org/mod_mbox/avro-dev/201603.mbox/ajax/%3C828828B1-8A58-4050-81B4-C3EF0F26041B%40rat.io%3E

  was:
/* I posted this bug already to the dev-mailinglist [4]. Reporting it here 
again only to make sure it doesn't get lost and because this is the right 
place. */

The complete, slightly more involved code is on [1], especially [2], the JSON 
schema is at [3], but the relevant parts of the code follow below.


// parsing the schema
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse(new File("schema.avsc")) ;
// setting up the encoder and driver
Json.ObjectWriter jsonDatumWriter = new Json.ObjectWriter();
OutputStream output = new FileOutputStream(new File("output.json"));
Encoder encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
// writing
System.out.println(payload); // reassuring test the payload is intact
jsonDatumWriter.write(payload, encoder);


The console will print out a nice JSON string (the payload), followed by the 
following exception:
Exception in thread "main" java.lang.NullPointerException
at org.apache.avro.data.Json.write(Json.java:183)
at org.apache.avro.data.Json.writeObject(Json.java:272)
at org.apache.avro.data.Json.access$000(Json.java:48)
at org.apache.avro.data.Json$ObjectWriter.write(Json.java:122)
at converTor.WriterObject.append(WriterObject.java:59)
at converTor.ConverTor.main(ConverTor.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

My beginners guess is that the source of the problem is a call of asToken() in 
org.codehaus.jackson.JsonNode, which is abstract.


[0] 
https://issues.apache.org/jira/browse/avro/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
[1] https://github.com/tomlurge/converTor
[2] 
https://github.com/tomlurge/converTor/blob/master/src/converTor/WriterObject.java
[3] 
https://github.com/tomlurge/converTor/blob/master/src/converTor/avro/schemata/Torperf.avsc
[4] 
https://mail-archives.apache.org/mod_mbox/avro-dev/201603.mbox/ajax/%3C828828B1-8A58-4050-81B4-C3EF0F26041B%40rat.io%3E


> NullPointerException from Json.ObjectWriter
> ---
>
> Key: AVRO-1807
> URL: https://issues.apache.org/jira/browse/AVRO-1807
> Project: Avro
>  Issue Type: Bug
>  Components: java
>   

[jira] [Commented] (AVRO-1807) NullPointerException from Json.ObjectWriter

2016-04-21 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252217#comment-15252217
 ] 

Ryan Blue commented on AVRO-1807:
-

Linking is under the "More" drop-down menu. Thanks for pointing this out and 
filing a bug for it.

> NullPointerException from Json.ObjectWriter
> ---
>
> Key: AVRO-1807
> URL: https://issues.apache.org/jira/browse/AVRO-1807
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
> Environment: avro 1.8.0  
> jackson-core-asl 1.9.13  
> jackson-mapper-asl 1.9.13  
> Java 7  
> Mac OS X 10.11.3
>Reporter: Thomas Lörtsch
>Priority: Blocker
>
> /* I posted this bug already to the dev-mailinglist [4]. Reporting it here 
> again only to make sure it doesn't get lost and because this is the right 
> place. */
> The complete, slightly more involved code is on [1], especially [2], the JSON 
> schema is at [3], but the relevant parts of the code follow below.
> // parsing the schema
> Schema.Parser parser = new Schema.Parser();
> Schema schema = parser.parse(new File("schema.avsc")) ;
> // setting up the encoder and driver
> Json.ObjectWriter jsonDatumWriter = new Json.ObjectWriter();
> OutputStream output = new FileOutputStream(new File("output.json"));
> Encoder encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
> // writing
> System.out.println(payload); // reassuring test the payload is intact
> jsonDatumWriter.write(payload, encoder);
> The console will print out a nice JSON string (the payload), followed by the 
> following exception:
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.avro.data.Json.write(Json.java:183)
> at org.apache.avro.data.Json.writeObject(Json.java:272)
> at org.apache.avro.data.Json.access$000(Json.java:48)
> at org.apache.avro.data.Json$ObjectWriter.write(Json.java:122)
> at converTor.WriterObject.append(WriterObject.java:59)
> at converTor.ConverTor.main(ConverTor.java:251)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> My beginners guess is that the source of the problem is a call of asToken() 
> in org.codehaus.jackson.JsonNode, which is abstract.
> [0] 
> https://issues.apache.org/jira/browse/avro/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel
> [1] https://github.com/tomlurge/converTor
> [2] 
> https://github.com/tomlurge/converTor/blob/master/src/converTor/WriterObject.java
> [3] 
> https://github.com/tomlurge/converTor/blob/master/src/converTor/avro/schemata/Torperf.avsc
> [4] 
> https://mail-archives.apache.org/mod_mbox/avro-dev/201603.mbox/ajax/%3C828828B1-8A58-4050-81B4-C3EF0F26041B%40rat.io%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1833) Release 1.8.1

2016-04-21 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252181#comment-15252181
 ] 

Ryan Blue commented on AVRO-1833:
-

I'm adding AVRO-1684. I've seen quite a few requests for it.

> Release 1.8.1
> -
>
> Key: AVRO-1833
> URL: https://issues.apache.org/jira/browse/AVRO-1833
> Project: Avro
>  Issue Type: New Feature
>Affects Versions: 1.8.1
>Reporter: Ryan Blue
> Fix For: 1.8.1
>
>
> Please link issues that should be included in the 1.8.1 release as blockers 
> of this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1832) Invoking toString() method unexpectedly modified the avro record.

2016-04-21 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252166#comment-15252166
 ] 

Ryan Blue commented on AVRO-1832:
-

We're discussing the 1.8.1 release on the mailing list and want to release in 
the new couple of weeks.

> Invoking toString() method unexpectedly modified the avro record.
> -
>
> Key: AVRO-1832
> URL: https://issues.apache.org/jira/browse/AVRO-1832
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Oleksandr Didukh
> Fix For: 1.8.1
>
>
> We use Apache Avro in our project and wanted to update it to 1.8.0 version.
> In our case avro records contain fields with "type": "bytes".
> We need to convert generated avro record to byte array. This usually works 
> fine, however, if we log the record (or apply any other operations that 
> invoke toString() method on org.apache.avro.specific.SpecificRecordBase) this 
> functionality is broken. Looks like the root cause of the issue is in this 
> line:
> https://github.com/apache/avro/pull/88/files#diff-5a41450f3008ee0da59dec14ada2356aL551
> Please review fix and corresponding test in 
> [this|https://github.com/apache/avro/pull/88/] pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1118) Specifying null as default of a union only works if null is specified as first type in the union

2016-04-21 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252163#comment-15252163
 ] 

Ryan Blue commented on AVRO-1118:
-

Ivan, the default is the one without quotes. {{"null"}} is a type name, while 
{{null}} is a null value in JSON that gets interpreted as a value.

> Specifying null as default of a union only works if null is specified as 
> first type in the union
> 
>
> Key: AVRO-1118
> URL: https://issues.apache.org/jira/browse/AVRO-1118
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.6.3
>Reporter: Mike Percy
>
> There is some unexpected behavior I am coming across where if I specify a 
> union as such:
> "type": ["string", "null"],
> "default": null
> I get an Exception:
> Exception in thread "main" org.apache.avro.AvroTypeException: Non-string 
> default value for string: null
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:363)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:350)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.getBinary(ResolvingGrammarGenerator.java:293)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.resolveRecords(ResolvingGrammarGenerator.java:271)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:118)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:50)
>   at org.apache.avro.io.ResolvingDecoder.resolve(ResolvingDecoder.java:82)
>   at org.apache.avro.io.ResolvingDecoder.(ResolvingDecoder.java:46)
>   at 
> org.apache.avro.io.DecoderFactory.resolvingDecoder(DecoderFactory.java:307)
>   at 
> org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:118)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:133)
>   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
> ...
> Whereas if I specify the schema as:
> "type": ["null", "string"],
> "default": null
> It works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1833) Release 1.8.1

2016-04-20 Thread Ryan Blue (JIRA)
Ryan Blue created AVRO-1833:
---

 Summary: Release 1.8.1
 Key: AVRO-1833
 URL: https://issues.apache.org/jira/browse/AVRO-1833
 Project: Avro
  Issue Type: New Feature
Affects Versions: 1.8.1
Reporter: Ryan Blue
 Fix For: 1.8.1


Please link issues that should be included in the 1.8.1 release as blockers of 
this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1820) Read annotations to request's fields

2016-04-20 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1820.
-
   Resolution: Fixed
 Assignee: Konstantin Usachev
Fix Version/s: 1.8.1

> Read annotations to request's fields
> 
>
> Key: AVRO-1820
> URL: https://issues.apache.org/jira/browse/AVRO-1820
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Konstantin Usachev
>Assignee: Konstantin Usachev
> Fix For: 1.8.1
>
>
> IDL compiler supports annotations on field's type but doesn't support 
> annotations on field itself.  I mean you can write something like
> {code}
> ...
> void method(int @annotation("value") param);
> ...
> {code}
> which translates to
> {code}
> ...{
> "name" : "param",
> "type" : "int",
> "annotation" : "value"
>   }...
> {code}
> but we lose annotations at org.apache.avro.Protocol.parseMessage. Another 
> related to this bug is AVRO-1819



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1819) org.apache.avro.Protocol.parseMessage doesn't respect request field's aliases

2016-04-20 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1819:

Assignee: Konstantin Usachev

> org.apache.avro.Protocol.parseMessage doesn't respect request field's aliases
> -
>
> Key: AVRO-1819
> URL: https://issues.apache.org/jira/browse/AVRO-1819
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Konstantin Usachev
>Assignee: Konstantin Usachev
> Fix For: 1.8.1
>
>
> Avro specification (https://avro.apache.org/docs/1.8.0/spec.html#Aliases 
> "Named types and fields may have aliases") and Idl compiler supports aliases 
> on request's fields. But org.apache.avro.Protocol.parseMessage have a special 
> code to parse request body (not the same as for an ordinary record) and 
> during field parsing we don't keep info about aliases. So it leads to lack of 
> important functionality supporting protocol refactoring with backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1819) org.apache.avro.Protocol.parseMessage doesn't respect request field's aliases

2016-04-20 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1819.
-
   Resolution: Fixed
Fix Version/s: 1.8.1

I committed this fix. Thanks for contributing, [~DrVirtual]!

> org.apache.avro.Protocol.parseMessage doesn't respect request field's aliases
> -
>
> Key: AVRO-1819
> URL: https://issues.apache.org/jira/browse/AVRO-1819
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.0
>Reporter: Konstantin Usachev
>Assignee: Konstantin Usachev
> Fix For: 1.8.1
>
>
> Avro specification (https://avro.apache.org/docs/1.8.0/spec.html#Aliases 
> "Named types and fields may have aliases") and Idl compiler supports aliases 
> on request's fields. But org.apache.avro.Protocol.parseMessage have a special 
> code to parse request body (not the same as for an ordinary record) and 
> during field parsing we don't keep info about aliases. So it leads to lack of 
> important functionality supporting protocol refactoring with backward 
> compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1832) Invoking toString() method unexpectedly modified the avro record.

2016-04-20 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250131#comment-15250131
 ] 

Ryan Blue commented on AVRO-1832:
-

I think this is a duplicate of AVRO-1799, which was 
[fixed|https://github.com/apache/avro/blame/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L550]
 and will be in the upcoming 1.8.1 release. Can you confirm that it looks like 
the same issue?

> Invoking toString() method unexpectedly modified the avro record.
> -
>
> Key: AVRO-1832
> URL: https://issues.apache.org/jira/browse/AVRO-1832
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Oleksandr Didukh
> Fix For: 1.8.1
>
>
> We use Apache Avro in our project and wanted to update it to 1.8.0 version.
> In our case avro records contain fields with "type": "bytes".
> We need to convert generated avro record to byte array. This usually works 
> fine, however, if we log the record (or apply any other operations that 
> invoke toString() method on org.apache.avro.specific.SpecificRecordBase) this 
> functionality is broken. Looks like the root cause of the issue is in this 
> line:
> https://github.com/apache/avro/pull/88/files#diff-5a41450f3008ee0da59dec14ada2356aL551
> Please review fix and corresponding test in 
> [this|https://github.com/apache/avro/pull/88/] pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1832) Invoking toString() method unexpectedly modified the avro record.

2016-04-20 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1832.
-
Resolution: Duplicate

> Invoking toString() method unexpectedly modified the avro record.
> -
>
> Key: AVRO-1832
> URL: https://issues.apache.org/jira/browse/AVRO-1832
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Oleksandr Didukh
> Fix For: 1.8.1
>
>
> We use Apache Avro in our project and wanted to update it to 1.8.0 version.
> In our case avro records contain fields with "type": "bytes".
> We need to convert generated avro record to byte array. This usually works 
> fine, however, if we log the record (or apply any other operations that 
> invoke toString() method on org.apache.avro.specific.SpecificRecordBase) this 
> functionality is broken. Looks like the root cause of the issue is in this 
> line:
> https://github.com/apache/avro/pull/88/files#diff-5a41450f3008ee0da59dec14ada2356aL551
> Please review fix and corresponding test in 
> [this|https://github.com/apache/avro/pull/88/] pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1708) Memory leak with WeakIdentityHashMap?

2016-04-18 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246016#comment-15246016
 ] 

Ryan Blue commented on AVRO-1708:
-

We're replacing the use of weak and identity hashmaps with guava 
implementations. If you think this one isn't correct, let's fix it that way.

> Memory leak with WeakIdentityHashMap?
> -
>
> Key: AVRO-1708
> URL: https://issues.apache.org/jira/browse/AVRO-1708
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>
> WeakIdentityHashMap used in GenericDatumReader has only weak Keys, 
> it seems to grow, and values remain in map which looks like a memory leak...
> java WeakhashMap has Weak Entries which allows the GC to collect a entire 
> entry, which prevents leaks...
> the javadoc of this class claims: "Implements a combination of WeakHashMap 
> and IdentityHashMap." which is not really the case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

2016-04-17 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244991#comment-15244991
 ] 

Ryan Blue commented on AVRO-1704:
-

Sorry if what I said wasn't clear. I'm not proposing that we get rid of the 
header. I'm saying that we make it one byte instead of 4. I think what I 
outlined addresses the case where the schema cache miss is expensive and 
balances that with the per-message overhead. (I'm fine moving forward with the 
FP considered part of the body.)

A one-byte header results in lower than a 1/256 chance of an expensive lookup 
(by choosing carefully). Why is that too high? Why 4 bytes and not, for 
example, 2 for a 1/65536 chance?

I disagree that the impact of extra bytes is too small to matter. It (probably) 
won't cause fragmentation when sending one message, but we're not talking about 
just one message. Kafka's performance depends on batching records together for 
network operations and each message takes up space on disk. What matters is the 
percentage of data that is overhead. 4 bytes if your messages are 500 is 0.8%, 
and it is 4% if your messages are 100 bytes.

In terms of how much older data I can keep in a Kafka topic, that accounts for 
11m 30s to 57m 30s per day. If I provision for a 3-day window of data in Kafka, 
I'm losing between half an hour and 3 hours of that just to store 'Avr0' over 
and over. That's why I think we have to strike a balance between the two 
concerns. 1 or 2 bytes should really be sufficient, depending on the 
probability of a false-positive we want. And false-positives are only that 
costly if each one causes an RPC, which we can avoid with a little failure 
detection logic.

> Standardized format for encoding messages with Avro
> ---
>
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
>  Issue Type: Improvement
>Reporter: Daniel Schierbeck
>Assignee: Niels Basjes
> Attachments: AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are 
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync 
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, 
> meaning that I can read and write data with minimal effort across the various 
> languages in use in my organization. If there was a standardized format for 
> encoding single values that was optimized for out-of-band schema transfer, I 
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode 
> datums in this format, as well as a MessageReader that, given a SchemaStore, 
> would be able to decode datums. The reader would decode the fingerprint and 
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed 
> library users to inject custom backends. A simple, file system based one 
> could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AVRO-1794) Update docs after migration to git

2016-04-17 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue reassigned AVRO-1794:
---

Assignee: Ryan Blue

> Update docs after migration to git
> --
>
> Key: AVRO-1794
> URL: https://issues.apache.org/jira/browse/AVRO-1794
> Project: Avro
>  Issue Type: Task
>  Components: doc
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>
> The [vote to move to 
> git|https://mail-archives.apache.org/mod_mbox/avro-dev/201602.mbox/%3C56AFB9B9.8000304%40apache.org%3E]
>  just passed. Once the INFRA ticket is completed, we will need to [update 
> docs|https://avro.apache.org/version_control.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1794) Update docs after migration to git

2016-04-17 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1794.
-
Resolution: Fixed

Done. Thanks for pointing out the wiki page, Niels.

> Update docs after migration to git
> --
>
> Key: AVRO-1794
> URL: https://issues.apache.org/jira/browse/AVRO-1794
> Project: Avro
>  Issue Type: Task
>  Components: doc
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>
> The [vote to move to 
> git|https://mail-archives.apache.org/mod_mbox/avro-dev/201602.mbox/%3C56AFB9B9.8000304%40apache.org%3E]
>  just passed. Once the INFRA ticket is completed, we will need to [update 
> docs|https://avro.apache.org/version_control.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1750) GenericDatum API behavior breaking change

2016-04-17 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244945#comment-15244945
 ] 

Ryan Blue commented on AVRO-1750:
-

[~braden], could you take a look at the new patch?

> GenericDatum API behavior breaking change
> -
>
> Key: AVRO-1750
> URL: https://issues.apache.org/jira/browse/AVRO-1750
> Project: Avro
>  Issue Type: Bug
>  Components: c++
>Affects Versions: 1.7.7
>Reporter: Braden McDaniel
>Assignee: Thiruvalluvan M. G.
> Fix For: 1.9.0
>
> Attachments: AVRO-1750.patch
>
>
> It appears that a change was introduced to the {{avro::GenericDatum}} 
> implementation between 1.7.6 and 1.7.7 that causes unions to be handled 
> differently.
> The 1.7.6 implementation does this:
> {noformat}
> inline Type AVRO_DECL GenericDatum::type() const {
> return (type_ == AVRO_UNION) ?
> boost::any_cast(_)->type() : type_;
> }
> template
> const T& GenericDatum::value() const {
> return (type_ == AVRO_UNION) ?
> boost::any_cast(_)->value() :
> *boost::any_cast(_);
> }
> template
> T& GenericDatum::value() {
> return (type_ == AVRO_UNION) ?
> boost::any_cast(_)->value() :
> *boost::any_cast(_);
> }
> {noformat}
> …whereas the 1.7.7 implementation does this:
> {noformat}
> /**
>  * The avro data type this datum holds.
>  */
> Type type() const {
> return type_;
> }
> /**
>  * Returns the value held by this datum.
>  * T The type for the value. This must correspond to the
>  * avro type returned by type().
>  */
> template const T& value() const {
> return *boost::any_cast(_);
> }
> /**
>  * Returns the reference to the value held by this datum, which
>  * can be used to change the contents. Please note that only
>  * value can be changed, the data type of the value held cannot
>  * be changed.
>  *
>  * T The type for the value. This must correspond to the
>  * avro type returned by type().
>  */
> template T& value() {
> return *boost::any_cast(_);
> }
> {noformat}
> The result of this is that, if the underlying value is an {{AVRO_UNION}}, 
> calls to {{GenericDatum::type}} and {{GenericDatum::value<>}} that previously 
> resolved to the union member type no longer do so (and user code relying on 
> that behavior has been broken).
> This change apparently was made as part of the changes for AVRO-1474; 
> however, looking at the comments in that issue, it's not clear to me why it 
> was required for that fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-17 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244939#comment-15244939
 ] 

Ryan Blue commented on AVRO-1821:
-

Fixed. Thanks for catching that, [~nielsbasjes]!

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>Assignee: Bryan Harclerode
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an {{WeakIdentityHashMap}}, but I don't know if 
> there was some other reason object identity was important for this map. If a 
> non-identity map can be used, this will help reduce memory/CPU usage further 
> by not 

[jira] [Commented] (AVRO-1811) SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244451#comment-15244451
 ] 

Ryan Blue commented on AVRO-1811:
-

[~ryonday], thanks for the thorough bug report! It looks like the problem is 
that deepCopy simply doesn't check whether the returned data should be a Utf8 
or a String. It would be relatively easy to fix this by adding a method for 
string construction to GenericData and overriding it in SpecificData. Are you 
interested in contributing a patch?

> SpecificData.deepCopy() cannot be used if schema compiler generated Java 
> objects with Strings instead of UTF8
> -
>
> Key: AVRO-1811
> URL: https://issues.apache.org/jira/browse/AVRO-1811
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Critical
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them 
> generate fields of type {{string}} with the Java standard {{String}} type, 
> for wide interoperability with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific 
> {{Utf8}} type, requiring frequent usage of the {{toString()}} method in order 
> for default domain objects to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every 
> {{string}} field in a schema like so:
> {code}
> {
>   "name": "some_string",
>   "doc": "a field that is guaranteed to compile to java.lang.String",
>   "type": [
> "null",
> {
>   "type": "string",
>   "avro.java.string": "String"
> }
>   ]
> },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by 
> this annotation by volume; for teams using heterogenous clients, they may to 
> want to avoid  Java-specific annotation in their schema files, or may not 
> think to use it unless there exist Java exploiters of the schema at the time 
> the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  
> using the {{SpecificCompiler}}'s string type selection. This option actually 
> alters the schema carried by the object's {{SCHEMA$}} field to have the above 
> annotation in it, ensuring that when used by the Java API, the String type 
> will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created 
> by libraries that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the 
> {{java.lang.String}} string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the 
> original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out 
> of the {{GenericRecord}} 
> There is a unit test that demonstrate this 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion 
> should work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be 
> cast to java.lang.String}}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244436#comment-15244436
 ] 

Ryan Blue commented on AVRO-1642:
-

Evidently, I thought I had pushed the fix to master but I hadn't. I just did.

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Barry Jones
>Priority: Critical
>  Labels: build, maven, specific
> Fix For: 1.8.1
>
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244430#comment-15244430
 ] 

Ryan Blue commented on AVRO-1126:
-

[~cowtowncoder], that sounds good to me. I'd love to see someone work on this. 
If you're interested, please do and we'll get your changes in!

> Upgrade to Jackson 2+
> -
>
> Key: AVRO-1126
> URL: https://issues.apache.org/jira/browse/AVRO-1126
> Project: Avro
>  Issue Type: Task
>  Components: java
>Reporter: James Tyrrell
>Priority: Critical
> Fix For: 1.9.0
>
>
> Quite annoyingly with Jackson 2+ the base package name has changed from 
> org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the 
> dependencies from:
> {code:xml} 
> 
> org.codehaus.jackson
> jackson-core-asl
> ${jackson.version}
> 
> 
> org.codehaus.jackson
> jackson-mapper-asl
> ${jackson.version}
> 
> {code} 
> to:
> {code:xml} 
> 
> com.fasterxml.jackson.core
> jackson-core
> ${jackson.version}
> 
> 
> com.fasterxml.jackson.core
> jackson-databind
> ${jackson.version}
> 
> {code} 
> the base package in the code needs to be updated. More info can be found 
> [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do 
> the work just let me know what is preferable i.e. should I just attach a 
> patch to this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1818) Avoid buffer copy in DeflateCodec.compress and decompress

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244417#comment-15244417
 ] 

Ryan Blue commented on AVRO-1818:
-

[~rohini], thanks for letting us know about this. Would you like to contribute 
a patch or pull request that does what you suggest?

> Avoid buffer copy in DeflateCodec.compress and decompress
> -
>
> Key: AVRO-1818
> URL: https://issues.apache.org/jira/browse/AVRO-1818
> Project: Avro
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>
> One of our jobs reading avro hit OOM due to the buffer copy in compress and 
> decompress methods which is very inefficient. 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/file/DeflateCodec.java#L71-L86
> {code}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3236)
>   at 
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>   at org.apache.avro.file.DeflateCodec.decompress(DeflateCodec.java:84)
> {code}
> I would suggest using a class that extends ByteArrrayOutputStream like 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataOutputBuffer.java#L51-L53
> and do
> ByteBuffer result = ByteBuffer.wrap(buf.getData(), 0, buf.getLength());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1823) DataFileStream to include the exception when reading of the magic fail

2016-04-16 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1823.
-
   Resolution: Fixed
 Assignee: Koji Noguchi
Fix Version/s: 1.8.1

> DataFileStream to include the exception when reading of the magic fail
> --
>
> Key: AVRO-1823
> URL: https://issues.apache.org/jira/browse/AVRO-1823
> Project: Avro
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Trivial
> Fix For: 1.8.1
>
> Attachments: avro-1823-v01.patch
>
>
> When reading the Avro file failed with 
> {noformat}
> Caused by: java.io.IOException: Not a data file.
>   at 
> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> {noformat}
> it made the debugging a bit hard since the exception causing it was swallowed 
> and not shown to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1823) DataFileStream to include the exception when reading of the magic fail

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244410#comment-15244410
 ] 

Ryan Blue commented on AVRO-1823:
-

Thanks [~knoguchi]! I've committed this.

> DataFileStream to include the exception when reading of the magic fail
> --
>
> Key: AVRO-1823
> URL: https://issues.apache.org/jira/browse/AVRO-1823
> Project: Avro
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Priority: Trivial
> Fix For: 1.8.1
>
> Attachments: avro-1823-v01.patch
>
>
> When reading the Avro file failed with 
> {noformat}
> Caused by: java.io.IOException: Not a data file.
>   at 
> org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
> {noformat}
> it made the debugging a bit hard since the exception causing it was swallowed 
> and not shown to the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1827) Handling correctly optional fields when converting Protobuf to Avro

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244404#comment-15244404
 ] 

Ryan Blue commented on AVRO-1827:
-

[~jakub_kahovec], thanks for finding this and fixing it. I agree with you that 
this is the correct behavior. The patch looks fine to me (other than one nit: 
// should be indented with the comment) but I'd like to see some tests that 
define the correct behavior and validate it. Could you add those? Thanks!

> Handling correctly optional fields when converting Protobuf to Avro
> ---
>
> Key: AVRO-1827
> URL: https://issues.apache.org/jira/browse/AVRO-1827
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.7.7, 1.8.0
>Reporter: Jakub Kahovec
> Attachments: AVRO-1827.patch
>
>
> Hello,
> as of the current implementation of converting protobuf files into avro 
> format, protobuf optional fields are being  given default values in the avro 
> schema if not specified explicitly. 
> So for instance when the protobuf field is defined as  
> {quote}
> optional int64 fieldInt64 = 1;
> {quote}
> in the avro schema it appears as
> {quote}
>  "name" : "fieldInt64",
>   "type" : "long",
>   "default" : 0
> {quote}
> The problem with this implementation is that we are losing information about 
> whether the field was present or not in the original protobuf, as when we ask 
> for this field's value in avro we will be given the default value. 
> What I'm proposing instead is that if the field in the protobuf is defined as 
> optional and has no default value then the generated avro schema type will us 
> a union comprising the matching type and null type with default value null. 
> It is going to look like this:
> {quote}
>  "name" : "fieldIn64",
>   "type" : [ "null", "long" ],
>   "default" : null
> {quote}
> I'm aware that is a breaking change but I think that is the proper way how to 
> handle optional fields.
> I've also  created a patch which fixes the conversion
> Jakub 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244402#comment-15244402
 ] 

Ryan Blue commented on AVRO-1821:
-

I committed the fix. Thanks for your contribution, [~baharclerode]!

I updated it slightly to avoid the need for reflection in the test (used 
package-private instead) and I used a Guava weak identity map instead of the 
one we're trying to move away from.

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>Assignee: Bryan Harclerode
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an 

[jira] [Updated] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-16 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1821:

Assignee: Bryan Harclerode

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>Assignee: Bryan Harclerode
> Attachments: 
> 0001-AVRO-1821-Fix-memory-leak-of-Schemas-in-ReflectData.patch
>
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm not sure why an {{IdentityHashMap}} was used instead of a 
> standard {{HashMap}}, since two equivalent schemas have the same set of 
> {{FieldAccessor}}. Everything appears to work and all tests pass if I use a 
> {{WeakHashMap}} instead of an {{WeakIdentityHashMap}}, but I don't know if 
> there was some other reason object identity was important for this map. If a 
> non-identity map can be used, this will help reduce memory/CPU usage further 
> by not regenerating all the field accessors for equivalent schemas.
> 

[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

2016-04-16 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244347#comment-15244347
 ] 

Ryan Blue commented on AVRO-1704:
-

Looks like I was a little too optimistic about time to review things this week. 
Sorry for the delay. I think we're close to a spec. Here are some additional 
thoughts.

Looks like everyone is for using the CRC-64-AVRO fingerprint, which is good 
because it can be implemented in each language and doesn't require a library 
dependency. That's also what's often used in practice.

+1 for an interface in Avro that lets you plug in a schema resolver.

I think the fingerprint should be considered part of the header rather than the 
body. It's a small distinction, but the fingerprint is a proxy for the schema 
here and the body/payload depends on it. Schema is in the container file 
header, so it is consistent.

I want to avoid a 4-byte sentinel value in each message. There are two uses for 
it: to make sure the message is Avro and to communicate the format version 
should we want to change it later.

Because the schema fingerprint is included in the message, it is very unlikely 
that unknown payloads will be read as Avro messages because it requires a 
collision with an 8-byte schema fingerprint. I think that's plenty of 
protection from passing along corrupt data. The concern that doesn't address is 
what happens when a fingerprint is unknown, which is a lot of cases will cause 
a REST call to resolve it. I don't think adding 4 bytes to every encoded 
payload is worth avoiding this case when the lookup can detect some number of 
failures and stop making the RPC calls. I just don't think we should design 
around a solvable problem in the format like that.

I think the second use, versioning the format, is a good idea. That only 
requires one byte and including that byte can also serve as a way to detect 
non-Avro payloads, just with a higher probability of collision. I think that's 
a reasonable compromise. There would be something a 1/256 chance that the first 
byte collides, assuming that byte is random in the non-Avro payload. That 
dramatically reduces the problem of making RPC calls to resolve unknown schema 
FPs. We want to choose the version byte carefully because other formats could 
easily have 0x00, 0x01, or an ASCII character there. I propose the version 
number with the MSB set, 0x80. That's unlikely to conflict with a flags byte, 
the first byte of a number, or the first character of a string.

That makes the format:
{code}
message = header body
 header = 0x80 CRC-64-AVRO(schema) (8 bytes, little endian)
   body = encoded Avro bytes using schema
{code}

We could additionally have a format with a 4-byte FP, version 0x81, if anyone 
is interested in it. Something simple like XOR the first 4 bytes with the 
second 4 bytes of the CRC-64-AVRO fingerprint. 8 bytes just seems like a lot 
when this gets scaled up to billions of records.

One last thought: in the implementation, it would be nice to allow skipping the 
version byte because a lot of people have already implemented this as 
CRC-64-AVRO + encoded bytes. That would make the Avro implementation compatible 
with existing data flows and increase the chances that we can move to this 
standard format.

> Standardized format for encoding messages with Avro
> ---
>
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
>  Issue Type: Improvement
>Reporter: Daniel Schierbeck
>Assignee: Niels Basjes
> Attachments: AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are 
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync 
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, 
> meaning that I can read and write data with minimal effort across the various 
> languages in use in my organization. If there was a standardized format for 
> encoding single values that was optimized for out-of-band schema transfer, I 
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode 
> datums in this format, as well as a MessageReader that, given a SchemaStore, 
> would be able to decode datums. The reader would decode the fingerprint and 
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is 

[jira] [Commented] (AVRO-1830) Avro-Perl DataFileReader chokes when avro.codec is absent

2016-04-15 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243168#comment-15243168
 ] 

Ryan Blue commented on AVRO-1830:
-

[~_alexm], are you still up for reviewing perl patches?

> Avro-Perl DataFileReader chokes when avro.codec is absent
> -
>
> Key: AVRO-1830
> URL: https://issues.apache.org/jira/browse/AVRO-1830
> Project: Avro
>  Issue Type: Bug
>  Components: perl
>Affects Versions: 1.8.0
>Reporter: SK Liew
>Priority: Minor
> Attachments: Avro-1830.patch
>
>
> When a container does not specify its "avro.codec", it should be assumed to 
> be "null". An exception is thrown when I try to read such a container using 
> Avro::DataFileReader. The error happens at Avro/DataFileReader.pm line 101.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1826) build.sh rat fails over extra license files and many others.

2016-04-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237518#comment-15237518
 ] 

Ryan Blue commented on AVRO-1826:
-

It looks like line 222 in the patch was added by mistake? It adds "<"

Otherwise, this looks good to me. +1. Thanks for taking the time to fix this, 
it's great to have people helping on the license stuff.

> build.sh rat fails over extra license files and many others.
> 
>
> Key: AVRO-1826
> URL: https://issues.apache.org/jira/browse/AVRO-1826
> Project: Avro
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Niels Basjes
> Attachments: AVRO-1826-20160410.patch
>
>
> When running ./build.sh rat this will fail due to several license related 
> files we recently added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1828) Add EditorConfig file

2016-04-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237408#comment-15237408
 ] 

Ryan Blue commented on AVRO-1828:
-

Sounds good to me!

> Add EditorConfig file
> -
>
> Key: AVRO-1828
> URL: https://issues.apache.org/jira/browse/AVRO-1828
> Project: Avro
>  Issue Type: Improvement
>Reporter: Niels Basjes
>
> I was working with Apache Flink last week and they recently implemented 
> http://editorconfig.org/ ( see here 
> https://github.com/apache/flink/blob/master/.editorconfig )
> Essentially this is a very simple config file that instructs a great many 
> editors to adhere to the main coding standard choices (things like character 
> encoding, tabs v.s. spaces , newlines, etc) for a specific project on a per 
> file type basis.
> When someone opens the project in a intelliJ then this will automatically use 
> these settings.
> Proposal: 
> # We implement this for Avro at the root level with global defaults.
> # We implement a specific file per language. I think we should start with the 
> top level scripting (like build.sh and pom.xml) and Java as the first 
> language.
> # We fix the violations of this standard in a single commit per language. 
> Note that if we don't fix those violations then later commits will be 
> 'harder' to keep clean (you will see a lot of unrelated changes) because the 
> IDEs will 'enforce' the standard on all touched files.
> What do you guys think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

2016-04-11 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235438#comment-15235438
 ] 

Ryan Blue commented on AVRO-1704:
-

Thanks for working on this, Niels. I'll make some comments later today or 
tomorrow.

> Standardized format for encoding messages with Avro
> ---
>
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
>  Issue Type: Improvement
>Reporter: Daniel Schierbeck
>Assignee: Niels Basjes
> Attachments: AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are 
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync 
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, 
> meaning that I can read and write data with minimal effort across the various 
> languages in use in my organization. If there was a standardized format for 
> encoding single values that was optimized for out-of-band schema transfer, I 
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode 
> datums in this format, as well as a MessageReader that, given a SchemaStore, 
> would be able to decode datums. The reader would decode the fingerprint and 
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed 
> library users to inject custom backends. A simple, file system based one 
> could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1821) Avro (Java) Memory Leak in ReflectData Caching

2016-04-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233776#comment-15233776
 ] 

Ryan Blue commented on AVRO-1821:
-

Thanks for tracking this down, [~baharclerode]. I think you're right about the 
memory leak. It looks like you've done a great job putting together a test case 
and the fix. Could you put together a patch or pull request with those and 
we'll get it committed?

For your question about the IdentityHashMap vs regular HashMap, I think the 
main idea is that because these lookups are in very tight loops, we want to 
avoid unnecessary operations. It's cheap to keep a copy per schema because 
there aren't typically a huge number of schemas in an app. But, we do like to 
use weak maps to avoid problems like this. Thanks for working on this!

> Avro (Java) Memory Leak in ReflectData Caching
> --
>
> Key: AVRO-1821
> URL: https://issues.apache.org/jira/browse/AVRO-1821
> Project: Avro
>  Issue Type: Bug
>  Components: java
> Environment: OS X 10.11.3
> {code}java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode){code}
>Reporter: Bryan Harclerode
>
> I think I have encountered one of the memory leaks described by AVRO-1283 in 
> the way Java Avro implements field accessor caching in {{ReflectData}}. When 
> a reflected object is serialized, the key of {{ClassAccessorData.bySchema}} 
> (as retained by {{ReflectData.ACCESSOR_CACHE}}) retains a strong reference to 
> the schema that was used to serialize the object, but there exists no code 
> path for clearing these references after a schema will no longer be used.
> While in most cases, a class will probably only have one schema associated 
> with it (created and cached by {{ReflectData.getSchema(Type)}}), I 
> experienced {{OutOfMemoryError}} when serializing generic classes with 
> dynamically-generated schemas. The following is a minimal example which will 
> exhaust a 50MiB heap ({{-Xmx50m}}) after about 190K iterations:
> {code:title=AvroMemoryLeakMinimal.java|borderStyle=solid}
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.Collections;
> import org.apache.avro.Schema;
> import org.apache.avro.io.BinaryEncoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroMemoryLeakMinimal {
> public static void main(String[] args) throws IOException {
> long count = 0;
> EncoderFactory encFactory = EncoderFactory.get();
> try {
> while (true) {
> // Create schema
> Schema schema = Schema.createRecord("schema", null, null, 
> false);
> schema.setFields(Collections.emptyList());
> // serialize
> ByteArrayOutputStream baos = new ByteArrayOutputStream(1024);
> BinaryEncoder encoder = encFactory.binaryEncoder(baos, null);
> (new ReflectDatumWriter(schema)).write(new Object(), 
> encoder);
> byte[] result = baos.toByteArray();
> count++;
> }
> } catch (OutOfMemoryError e) {
> System.out.print("Memory exhausted after ");
> System.out.print(count);
> System.out.println(" schemas");
> throw e;
> }
> }
> }
> {code}
> I was able to fix the bug in the latest 1.9.0-SNAPSHOT from git with the 
> following patch to {{ClassAccessorData.bySchema}} to use weak keys so that it 
> properly released the {{Schema}} objects if no other threads are still 
> referencing them:
> {code:title=ReflectData.java.patch|borderStyle=solid}
> --- a/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> +++ b/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectData.java
> @@ -57,6 +57,7 @@ import org.apache.avro.io.DatumWriter;
>  import org.apache.avro.specific.FixedSize;
>  import org.apache.avro.specific.SpecificData;
>  import org.apache.avro.SchemaNormalization;
> +import org.apache.avro.util.WeakIdentityHashMap;
>  import org.codehaus.jackson.JsonNode;
>  import org.codehaus.jackson.node.NullNode;
>  
> @@ -234,8 +235,8 @@ public class ReflectData extends SpecificData {
>  private final Class clazz;
>  private final Map byName =
>  new HashMap();
> -private final IdentityHashMap bySchema =
> -new IdentityHashMap();
> +private final WeakIdentityHashMap bySchema =
> +new WeakIdentityHashMap();
>  
>  private ClassAccessorData(Class c) {
>clazz = c;
> {code}
> Additionally, I'm 

[jira] [Commented] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233772#comment-15233772
 ] 

Ryan Blue commented on AVRO-1642:
-

Thanks for finishing this, [~barryjones]!

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Barry Jones
>Priority: Critical
>  Labels: build, maven, specific
> Fix For: 1.8.1
>
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-09 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1642:

   Resolution: Fixed
Fix Version/s: 1.8.1
   Status: Resolved  (was: Patch Available)

The fixes look good and tests pass. I've committed this.

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Barry Jones
>Priority: Critical
>  Labels: build, maven, specific
> Fix For: 1.8.1
>
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-09 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1642:

Assignee: Barry Jones  (was: Prateek Rungta)

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Barry Jones
>Priority: Critical
>  Labels: build, maven, specific
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1824) Avro C++ Documentation fix

2016-04-09 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1824.
-
Resolution: Fixed

> Avro C++ Documentation fix
> --
>
> Key: AVRO-1824
> URL: https://issues.apache.org/jira/browse/AVRO-1824
> Project: Avro
>  Issue Type: Bug
>  Components: doc
>Affects Versions: 1.8.0
> Environment: Documentation for Ubuntu
>Reporter: William S Fulton
>Assignee: William S Fulton
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.8.1
>
> Attachments: AVRO-1824.patch
>
>
> Add missing dependencies for using C++ on Ubuntu. Required for Ubuntu 14.04.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1824) Avro C++ Documentation fix

2016-04-09 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1824:

Assignee: William S Fulton

> Avro C++ Documentation fix
> --
>
> Key: AVRO-1824
> URL: https://issues.apache.org/jira/browse/AVRO-1824
> Project: Avro
>  Issue Type: Bug
>  Components: doc
>Affects Versions: 1.8.0
> Environment: Documentation for Ubuntu
>Reporter: William S Fulton
>Assignee: William S Fulton
>Priority: Trivial
>  Labels: documentation
> Fix For: 1.8.1
>
> Attachments: AVRO-1824.patch
>
>
> Add missing dependencies for using C++ on Ubuntu. Required for Ubuntu 14.04.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1825) Allow running build.sh dist under git

2016-04-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233760#comment-15233760
 ] 

Ryan Blue commented on AVRO-1825:
-

+1

Thanks, Niels!

> Allow running build.sh dist under git
> -
>
> Key: AVRO-1825
> URL: https://issues.apache.org/jira/browse/AVRO-1825
> Project: Avro
>  Issue Type: Improvement
>  Components: build
>Reporter: Niels Basjes
>Assignee: Niels Basjes
> Attachments: AVRO-1825-20160409.patch
>
>
> When working of a git clone instead of an svn checkout the build.sh dist 
> cannot run due to an explicit dependency on the fact that the working 
> directory must be an svn checkout.
> This should be a bit more flexible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1372) Avro file data encryption for Java

2016-04-07 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15231200#comment-15231200
 ] 

Ryan Blue commented on AVRO-1372:
-

[~ktham], I'm not aware of anyone actively trying to get this in. Is this 
something that needs to go in Avro or can encryption be handled through the 
Hadoop codec API?

> Avro file data encryption for Java 
> ---
>
> Key: AVRO-1372
> URL: https://issues.apache.org/jira/browse/AVRO-1372
> Project: Avro
>  Issue Type: Sub-task
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Jerry Chen
>  Labels: Rhino
> Attachments: AVRO-1372.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For Java, Avro has already support of compressions such as deflate or snappy 
> based on its own codec infrastructure. To support encryption, this work needs 
> to extend the codec infrastructure with the ability to handle codec context 
> such as the encryption keys. The reader and writer needs to be extended for 
> context handling as well. Also AES codec will be implemented as the default 
> encryption codec.
> To keep the compatibility of the reader and writer, new constructor and 
> methods with codec context can be added instead of modifying the existing 
> methods. Although there are other ways for passing in the codec context such 
> as using thread local or system properties, these methods are easier to cause 
> problems, not direct and not self explaining at the API level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-05 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227055#comment-15227055
 ] 

Ryan Blue commented on AVRO-1642:
-

bq. add an input validation that the schema is a Record, add a failure test for 
same

This is saying that {{calcAllArgConstructorParameterUnits}} should validate 
that the incoming schema actually is a record schema with a precondition that 
checks {{schema.getType() == Schema.Type.RECORD}}. And, there should be a test 
for this case.

bq. Types that need a java long or double count as 2, everything else is 1. 
Important to note that it's only the primitive types long and double

When counting parameters, the primitive values would count as 2. But, since the 
specific compiler uses Long and Double, they count as 1. We just need to fix 
how the number of arguments are counted.

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Prateek Rungta
>Priority: Critical
>  Labels: build, maven, specific
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1642) JVM Spec Violation 255 Parameter Limit Exceeded

2016-04-05 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226641#comment-15226641
 ] 

Ryan Blue commented on AVRO-1642:
-

Looks like we're waiting for some of the review comments to be addressed. If 
you'd like to pick this up and make those changes, I'll review it and commit. 
Thanks, Barry!

> JVM Spec Violation 255 Parameter Limit Exceeded 
> 
>
> Key: AVRO-1642
> URL: https://issues.apache.org/jira/browse/AVRO-1642
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows/Linux all Java
>Reporter: Bryce Alcock
>Assignee: Prateek Rungta
>Priority: Critical
>  Labels: build, maven, specific
> Attachments: AVRO-1642-0.patch, AVRO-1642-1.patch, avro-1642-fail.tar
>
>
> The JVM Spec indicates that:
> {quote}The number of method parameters is limited to 255 by the definition of 
> a method descriptor (§4.3.3), where the limit includes one unit for this in 
> the case of instance or interface method invocations. Note that a method 
> descriptor is defined in terms of a notion of method parameter length in 
> which a parameter of type long or double contributes two units to the length, 
> so parameters of these types further reduce the limit. {quote}
> Avro Generated Java code with say more than 255 fields will create a 
> constructor that is not valid and won't compile.
> Simple test is to create a 256 field avro schema, use the avro-maven auto 
> code gen plugin, and try to compile the resulting class.
> DON'T use linux when doing this use windows, my suspicion is that Linux JavaC 
> generates invalid byte code but does not complain.
> Windows will correctly complain indicating that you are a violator of the JVM 
> specification.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199881#comment-15199881
 ] 

Ryan Blue commented on AVRO-1667:
-

Is there a semantic difference between the two?

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Fix For: 1.8.1
>
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-18 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199896#comment-15199896
 ] 

Ryan Blue commented on AVRO-1667:
-

I don't think that's correct. The call to fixups.size() is in initialization, 
not in the size check. The integer n should be constant.

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Fix For: 1.8.1
>
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-15 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196059#comment-15196059
 ] 

Ryan Blue commented on AVRO-1667:
-

I committed the test that caught the original bug, just not the binary tree 
test that didn't catch the other case. We can revisit it later.

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Fix For: 1.8.1
>
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2016-03-15 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195529#comment-15195529
 ] 

Ryan Blue commented on AVRO-1723:
-

[~zolyfarkas], now that AVRO-1667 is in, could you rebase this on master? 
Thanks!

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-15 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1667.
-
   Resolution: Fixed
 Assignee: Zoltan Farkas
Fix Version/s: 1.8.1

I committed the fix with the update. Thanks for working on this, [~zolyfarkas]! 
I didn't add the test since it doesn't catch the broken case. I'd like to get 
one that does, but there's no need to make this fix dependent on it.

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Fix For: 1.8.1
>
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-07 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183246#comment-15183246
 ] 

Ryan Blue commented on AVRO-1667:
-

We just need a fixup that applies to a production that gets copied twice. I 
think a recursive binary tree structure would do it, where each node has 
optional left and right child nodes.

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1667) Parser symbol tree flattening is broken for recursive schemas

2016-03-06 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1667:

Attachment: AVRO-1667.2.patch

[~zolyfarkas], thanks for your patience on this one. It took a while for me to 
get the time to learn the grammar part of the code.

Your fix works, but if a given sequence is copied more than once, the other 
copies aren't fixed up. That happens because your versions moves fixups rather 
than making copies. I've updated the patch to add copies instead, which was a 
simple fix. I'd appreciate it if you could review it for me. When we get a +1, 
I'll commit this.

> Parser symbol tree flattening is broken for recursive schemas
> -
>
> Key: AVRO-1667
> URL: https://issues.apache.org/jira/browse/AVRO-1667
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
> Attachments: AVRO-1667.2.patch, avro-1667.patch
>
>
> Here is a unit test to reproduce:
> {noformat}
> package org.apache.avro.io.parsing;
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.HashSet;
> import java.util.Set;
> import junit.framework.Assert;
> import org.apache.avro.Schema;
> import org.junit.Test;
> public class SymbolTest {
> private static final String SCHEMA = 
> "{\"type\":\"record\",\"name\":\"SampleNode\","
> + "\"namespace\":\"org.spf4j.ssdump2.avro\",\n" +
> " \"fields\":[\n" +
> "{\"name\":\"count\",\"type\":\"int\",\"default\":0},\n" +
> "{\"name\":\"subNodes\",\"type\":\n" +
> "   {\"type\":\"array\",\"items\":{\n" +
> "   \"type\":\"record\",\"name\":\"SamplePair\",\n" +
> "   \"fields\":[\n" +
> "  {\"name\":\"method\",\"type\":\n" +
> "  {\"type\":\"record\",\"name\":\"Method\",\n" +
> "  \"fields\":[\n" +
> " 
> {\"name\":\"declaringClass\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},\n"
>  +
> " 
> {\"name\":\"methodName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}\n"
>  +
> "  ]}},\n" +
> "  {\"name\":\"node\",\"type\":\"SampleNode\"}]}}}]}";
> @Test
> public void testSomeMethod() throws IOException {
> Schema schema = new Schema.Parser().parse(SCHEMA);
> Symbol root = Symbol.root(new ResolvingGrammarGenerator()
> .generate(schema, schema, new 
> HashMap()));
> validateNonNull(root, new HashSet());
> }
> private static void validateNonNull(final Symbol symb, Set seen) {
> if (seen.contains(symb)) {
> return;
> } else {
> seen.add(symb);
> }
> if (symb.production != null) {
> for (Symbol s : symb.production) {
> if (s == null) {
> Assert.fail("invalid parsing tree should not contain 
> nulls");
> }
> if (s.kind != Symbol.Kind.ROOT) {
> validateNonNull(s, seen);;
> }
> }
> }
> }
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1799) java: GenericData.toString() mutates underlying ByteBuffer backed data

2016-02-29 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1799.
-
   Resolution: Fixed
 Assignee: Ryan Blue
Fix Version/s: 1.9.0

Merged the fix. Thanks for reporting this, Greg!

> java: GenericData.toString() mutates underlying ByteBuffer backed data
> --
>
> Key: AVRO-1799
> URL: https://issues.apache.org/jira/browse/AVRO-1799
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Greg McNutt
>Assignee: Ryan Blue
>Priority: Critical
> Fix For: 1.9.0
>
>
> Around line 550, the writeEscapedString() method, used to serialize a byte 
> array, will alter the underlying ByteBuffer's pointer.  Meaning subsequent 
> uses of a toString()'d avro object will yield different results.
> Need to not use a generic decode on a ByteBuffer; either clone the element 
> (expensive), or iterate over the elements with an external index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1799) java: GenericData.toString() mutates underlying ByteBuffer backed data

2016-02-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167388#comment-15167388
 ] 

Ryan Blue commented on AVRO-1799:
-

Thanks for reviewing Greg! I've updated the test as you suggested and I'll 
merge this later today unless someone objects.

> java: GenericData.toString() mutates underlying ByteBuffer backed data
> --
>
> Key: AVRO-1799
> URL: https://issues.apache.org/jira/browse/AVRO-1799
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Greg McNutt
>Priority: Critical
>
> Around line 550, the writeEscapedString() method, used to serialize a byte 
> array, will alter the underlying ByteBuffer's pointer.  Meaning subsequent 
> uses of a toString()'d avro object will yield different results.
> Need to not use a generic decode on a ByteBuffer; either clone the element 
> (expensive), or iterate over the elements with an external index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1803) nullable fields and it's default value

2016-02-24 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15165682#comment-15165682
 ] 

Ryan Blue commented on AVRO-1803:
-

I think the wording is somewhat confusing, but actually correct. When it says 
"the default value of such unions is *typically* null" it means that users 
commonly add a null default by explicitly adding it. If you don't add a 
default, the readers expect a value to be present.

The default is only used when there is no value in the encoded bytes, but the 
reader expects one. The reader either gets the default that was set in the read 
schema, or will get an exception that there is no default value and the data 
file was missing the value.

> nullable fields and it's default value
> --
>
> Key: AVRO-1803
> URL: https://issues.apache.org/jira/browse/AVRO-1803
> Project: Avro
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.7.7, 1.8.0
>Reporter: Marcel Silberhorn
>Priority: Minor
>
> as described in
> https://avro.apache.org/docs/current/spec.html
> {quote}h6. Unions
> Unions, as mentioned above, are represented using JSON arrays. For example, 
> {{\["null", "string"\]}} declares a schema which may be either a null or 
> string.
> (Note that when a default value is specified for a record field whose type is 
> a union, the type of the default value must match the +first+ element of the 
> union. Thus, for unions containing "null", the "null" is usually listed 
> first, since the default value of such unions is typically null.){quote}
> and the given example for Handshake with
> {code}
> {"name": "clientProtocol", "type": ["null", "string"]},
> {code}
> or even the "tweet.hashtags" example from 
> http://stackoverflow.com/questions/31864450
> as seen in http://stackoverflow.com/questions/9417732 you have to explicit 
> define the {{"default": null}} for the "nullable union field", else you get
> {{Field clientProtocol type:UNION pos:0 not set and has no default value}}
> so either there is a bug in the documentation or in the code ,)
> If its "just" a documentation error, please decrease this bug to a wish: add 
> some compiler warnings when using union without default value



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1790) Publish avro-js to NPM

2016-02-24 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163858#comment-15163858
 ] 

Ryan Blue commented on AVRO-1790:
-

[~tomwhite], can you add me as well? I can't edit either.

> Publish avro-js to NPM
> --
>
> Key: AVRO-1790
> URL: https://issues.apache.org/jira/browse/AVRO-1790
> Project: Avro
>  Issue Type: Task
>  Components: js
>Affects Versions: 1.8.0
>Reporter: Ryan Blue
>Assignee: Matthieu Monsch
> Fix For: 1.8.0
>
>
> Looks like we haven't [published the avro-js npm 
> module|https://www.npmjs.com/package/avro-js] yet. There's already an 'avro' 
> module and I think the code is setup to produce avro-js. [~mtth], can you 
> publish the 1.8.0 release artifact? You can find it here: 
> http://ftp.wayne.edu/apache/avro/avro-1.8.0/js/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1493) Avoid the "Turkish Locale Problem"

2016-02-21 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1493.
-
   Resolution: Fixed
Fix Version/s: (was: 1.7.8)
   1.8.1

I committed the fix. Thanks [~krschultz]! And thank you [~serkan_tas] for 
reporting the problem!

> Avoid the "Turkish Locale Problem"
> --
>
> Key: AVRO-1493
> URL: https://issues.apache.org/jira/browse/AVRO-1493
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.6
> Environment: Hadoop trunk build error on mac-os with turkish locale.
>Reporter: Serkan Taş
>Assignee: Kevin Schultz
> Fix For: 1.8.1
>
>
> Locale dependent String.toUpperCase(), String.toLowerCase() causes unexpected 
> behavior if the the locale is Turkish
> Not sure about String.equalsIgnoreCase(..).
> Here is the error :
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
> (default-testCompile) on project hadoop-common: Compilation failure
> [ERROR] 
> /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244]
>  unmappable character for encoding UTF-8
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
> (default-testCompile) on project hadoop-common: Compilation failure
> /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244]
>  unmappable character for encoding UTF-8
> I f i check the code i discovered the reason for error :
>  public static final org.apache.avro.Schema SCHEMA$ = new 
> org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"AvroRecord\",\"namespace\":\"org.apache.hadoop.io.serializer.avro\",\"fields\":[{\"name\":\"intField\",\"type\":\"Ýnt\"}]}");
> For the code generated from schema, locale dependent capitalization of letter 
> "i" turns in to "Ý" should be the same for "I" to "ı".
> Same bug exist in OPENEJB-1071, OAK-260, IBATIS-218.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1493) Avoid the "Turkish Locale Problem"

2016-02-21 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1493:

Assignee: Kevin Schultz

> Avoid the "Turkish Locale Problem"
> --
>
> Key: AVRO-1493
> URL: https://issues.apache.org/jira/browse/AVRO-1493
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.6
> Environment: Hadoop trunk build error on mac-os with turkish locale.
>Reporter: Serkan Taş
>Assignee: Kevin Schultz
> Fix For: 1.7.8
>
>
> Locale dependent String.toUpperCase(), String.toLowerCase() causes unexpected 
> behavior if the the locale is Turkish
> Not sure about String.equalsIgnoreCase(..).
> Here is the error :
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
> (default-testCompile) on project hadoop-common: Compilation failure
> [ERROR] 
> /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244]
>  unmappable character for encoding UTF-8
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
> (default-testCompile) on project hadoop-common: Compilation failure
> /Users/serkan/programlar/dev/hadooptest/hadoop-trunk/hadoop-common-project/hadoop-common/target/generated-test-sources/java/org/apache/hadoop/io/serializer/avro/AvroRecord.java:[10,244]
>  unmappable character for encoding UTF-8
> I f i check the code i discovered the reason for error :
>  public static final org.apache.avro.Schema SCHEMA$ = new 
> org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"AvroRecord\",\"namespace\":\"org.apache.hadoop.io.serializer.avro\",\"fields\":[{\"name\":\"intField\",\"type\":\"Ýnt\"}]}");
> For the code generated from schema, locale dependent capitalization of letter 
> "i" turns in to "Ý" should be the same for "I" to "ı".
> Same bug exist in OPENEJB-1071, OAK-260, IBATIS-218.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1778) IPC/RPC for JavaScript

2016-02-04 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133065#comment-15133065
 ] 

Ryan Blue commented on AVRO-1778:
-

So I posted that and then started the switch to git. :) The repo is currently 
read-only for the move and then we'll have to push to the new one. Should be 
good to go once INFRA-11205 is done.

> IPC/RPC for JavaScript
> --
>
> Key: AVRO-1778
> URL: https://issues.apache.org/jira/browse/AVRO-1778
> Project: Avro
>  Issue Type: Improvement
>  Components: javascript
>Reporter: Matthieu Monsch
>Assignee: Ryan Blue
> Attachments: AVRO-1778.patch
>
>
> This patch adds protocols to the JavaScript implementation.
> The API was designed to:
> + Be simple and idiomatic. The `Protocol` class added here is heavily 
> inspired by node.js' core `EventEmitter` to keep things as familiar as 
> possible [1]. Getting a client and server working is straightforward and 
> requires very few lines of code [2].
> + Support arbitrary transports, both stateful and stateless. Built-in node.js 
> streams are supported out of the box (e.g. TCP/UNIX sockets, or even 
> stdin/stdout). Exchanging messages over a custom transport requires 
> implementing a single simple function (see [3] for an example).
> + Work both server-side and in the browser!
> Ps: I also tested against both the Java and Python implementations over HTTP 
> and communication worked. 
> [1] https://github.com/mtth/avsc/wiki/API#ipc--rpc
> [2] https://github.com/mtth/avsc/wiki/Advanced-usage#remote-procedure-calls
> [3] https://github.com/mtth/avsc/wiki/Advanced-usage#transient-streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1778) IPC/RPC for JavaScript

2016-02-04 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132969#comment-15132969
 ] 

Ryan Blue commented on AVRO-1778:
-

+1. Thanks for the quick replies to my questions, [~mtth]. Since you're a 
committer now, can you commit the patch?

> IPC/RPC for JavaScript
> --
>
> Key: AVRO-1778
> URL: https://issues.apache.org/jira/browse/AVRO-1778
> Project: Avro
>  Issue Type: Improvement
>  Components: javascript
>Reporter: Matthieu Monsch
>Assignee: Ryan Blue
> Attachments: AVRO-1778.patch
>
>
> This patch adds protocols to the JavaScript implementation.
> The API was designed to:
> + Be simple and idiomatic. The `Protocol` class added here is heavily 
> inspired by node.js' core `EventEmitter` to keep things as familiar as 
> possible [1]. Getting a client and server working is straightforward and 
> requires very few lines of code [2].
> + Support arbitrary transports, both stateful and stateless. Built-in node.js 
> streams are supported out of the box (e.g. TCP/UNIX sockets, or even 
> stdin/stdout). Exchanging messages over a custom transport requires 
> implementing a single simple function (see [3] for an example).
> + Work both server-side and in the browser!
> Ps: I also tested against both the Java and Python implementations over HTTP 
> and communication worked. 
> [1] https://github.com/mtth/avsc/wiki/API#ipc--rpc
> [2] https://github.com/mtth/avsc/wiki/Advanced-usage#remote-procedure-calls
> [3] https://github.com/mtth/avsc/wiki/Advanced-usage#transient-streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1794) Update docs after migration to git

2016-02-04 Thread Ryan Blue (JIRA)
Ryan Blue created AVRO-1794:
---

 Summary: Update docs after migration to git
 Key: AVRO-1794
 URL: https://issues.apache.org/jira/browse/AVRO-1794
 Project: Avro
  Issue Type: Task
  Components: doc
Reporter: Ryan Blue


The [vote to move to 
git|https://mail-archives.apache.org/mod_mbox/avro-dev/201602.mbox/%3C56AFB9B9.8000304%40apache.org%3E]
 just passed. Once the INFRA ticket is completed, we will need to [update 
docs|https://avro.apache.org/version_control.html].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-695) Cycle Reference Support

2016-02-03 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-695:
---
   Resolution: Fixed
 Assignee: Ryan Blue
Fix Version/s: 1.8.0
   Status: Resolved  (was: Patch Available)

Closing this since it is available using logical types and is used as a [test 
case|https://github.com/apache/avro/blob/trunk/lang/java/avro/src/test/java/org/apache/avro/TestCircularReferences.java].

> Cycle Reference Support
> ---
>
> Key: AVRO-695
> URL: https://issues.apache.org/jira/browse/AVRO-695
> Project: Avro
>  Issue Type: New Feature
>  Components: spec
>Affects Versions: 1.7.6
>Reporter: Moustapha Cherri
>Assignee: Ryan Blue
> Fix For: 1.8.0
>
> Attachments: AVRO-695.patch, AVRO-695.patch, PERF_8000_cycles.zip, 
> avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz, 
> avro_circular_references.zip, avro_circular_refs6.patch, 
> avro_circular_refs7.patch, avro_circular_refs_2014_06_14.zip, 
> circular_refs_and_nonstring_map_keys_2014_06_25.zip
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This is a proposed implementation to add cycle reference support to Avro. It 
> basically introduce a new type named Cycle. Cycles contains a string 
> representing the path to the other reference.
> For example if we have an object of type Message that have a member named 
> previous with type Message too. If we have have this hierarchy:
> message
>   previous : message2
> message2
>   previous : message2
> When serializing the cycle path for "message2.previous" will be "previous".
> The implementation depend on ANTLR to evaluate those cycle at read time to 
> resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used 
> ANTLR to speed thing up. I kept in this implementation the generated code 
> from ANTLR though this should not be the case as this should be generated 
> during the build. I only updated the Java code.
> I did not make full unit testing but you can find "avrotest.Main" class that 
> can be used a preliminary test.
> Please do not hesitate to contact me for further clarification if this seems 
> interresting.
> Best regards,
> Moustapha Cherri



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1792) Cannot specify a 'null' default value

2016-02-03 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15131020#comment-15131020
 ] 

Ryan Blue commented on AVRO-1792:
-

I'm glad it's working for you. Thanks for reporting bugs and also for following 
up on the resolution!

> Cannot specify a 'null' default value
> -
>
> Key: AVRO-1792
> URL: https://issues.apache.org/jira/browse/AVRO-1792
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Kevin J. Price
>Priority: Minor
>
> Using the new Schema.Field constructors added in 1.8.0, it is no longer 
> possible to use the Java API to construct a schema field with a 'null' 
> default value. That is, the following schema cannot be constructed without 
> using the deprecated API:
> {code}
> {
>   "type": "record",
>   "name": "base",
>   "fields": [{
>   "name": "a",
>   "type": ["null", "string"],
>   "default": null
>   }]
> }
> {code}
> This is because passing a "null" value to the new API implies no default. 
> Passing the "JsonProperties.NULL_VALUE" sentinel value doesn't work either, 
> because when it is parsed by "JacksonUtils.toJsonNode", it turns into "null".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1789) Publish JS documentation

2016-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created AVRO-1789:
---

 Summary: Publish JS documentation
 Key: AVRO-1789
 URL: https://issues.apache.org/jira/browse/AVRO-1789
 Project: Avro
  Issue Type: Bug
  Components: js
Reporter: Ryan Blue


[~mtth] wrote some great docs for the JS implementation in lang/js/docs. We 
should convert those to HTML and publish them as part of the website. All we 
need to do is to drop the HTML in {{build/avro-doc-/}} during the 
build.sh dist target.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1790) Publish avro-js to NPM

2016-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created AVRO-1790:
---

 Summary: Publish avro-js to NPM
 Key: AVRO-1790
 URL: https://issues.apache.org/jira/browse/AVRO-1790
 Project: Avro
  Issue Type: Task
  Components: js
Affects Versions: 1.8.0
Reporter: Ryan Blue
Assignee: Matthieu Monsch
 Fix For: 1.8.0


Looks like we haven't [published the avro-js npm 
module|https://www.npmjs.com/package/avro-js] yet. There's already an 'avro' 
module and I think the code is setup to produce avro-js. [~mtth], can you 
publish the 1.8.0 release artifact? You can find it here: 
http://ftp.wayne.edu/apache/avro/avro-1.8.0/js/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1778) IPC/RPC for JavaScript

2016-02-02 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128691#comment-15128691
 ] 

Ryan Blue commented on AVRO-1778:
-

Having a look now.

> IPC/RPC for JavaScript
> --
>
> Key: AVRO-1778
> URL: https://issues.apache.org/jira/browse/AVRO-1778
> Project: Avro
>  Issue Type: Improvement
>  Components: javascript
>Reporter: Matthieu Monsch
>Assignee: Ryan Blue
> Attachments: AVRO-1778.patch
>
>
> This patch adds protocols to the JavaScript implementation.
> The API was designed to:
> + Be simple and idiomatic. The `Protocol` class added here is heavily 
> inspired by node.js' core `EventEmitter` to keep things as familiar as 
> possible [1]. Getting a client and server working is straightforward and 
> requires very few lines of code [2].
> + Support arbitrary transports, both stateful and stateless. Built-in node.js 
> streams are supported out of the box (e.g. TCP/UNIX sockets, or even 
> stdin/stdout). Exchanging messages over a custom transport requires 
> implementing a single simple function (see [3] for an example).
> + Work both server-side and in the browser!
> Ps: I also tested against both the Java and Python implementations over HTTP 
> and communication worked. 
> [1] https://github.com/mtth/avsc/wiki/API#ipc--rpc
> [2] https://github.com/mtth/avsc/wiki/Advanced-usage#remote-procedure-calls
> [3] https://github.com/mtth/avsc/wiki/Advanced-usage#transient-streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1778) IPC/RPC for JavaScript

2016-02-02 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129054#comment-15129054
 ] 

Ryan Blue commented on AVRO-1778:
-

This looks mostly good to me. Tests pass and coverage is at 100% (nice!). I 
have a couple of comments, which I've posted on my branch here: 
https://github.com/rdblue/avro/commit/aa64961b. Nothing major since all of the 
license stuff looks good.

> IPC/RPC for JavaScript
> --
>
> Key: AVRO-1778
> URL: https://issues.apache.org/jira/browse/AVRO-1778
> Project: Avro
>  Issue Type: Improvement
>  Components: javascript
>Reporter: Matthieu Monsch
>Assignee: Ryan Blue
> Attachments: AVRO-1778.patch
>
>
> This patch adds protocols to the JavaScript implementation.
> The API was designed to:
> + Be simple and idiomatic. The `Protocol` class added here is heavily 
> inspired by node.js' core `EventEmitter` to keep things as familiar as 
> possible [1]. Getting a client and server working is straightforward and 
> requires very few lines of code [2].
> + Support arbitrary transports, both stateful and stateless. Built-in node.js 
> streams are supported out of the box (e.g. TCP/UNIX sockets, or even 
> stdin/stdout). Exchanging messages over a custom transport requires 
> implementing a single simple function (see [3] for an example).
> + Work both server-side and in the browser!
> Ps: I also tested against both the Java and Python implementations over HTTP 
> and communication worked. 
> [1] https://github.com/mtth/avsc/wiki/API#ipc--rpc
> [2] https://github.com/mtth/avsc/wiki/Advanced-usage#remote-procedure-calls
> [3] https://github.com/mtth/avsc/wiki/Advanced-usage#transient-streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files

2016-01-28 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121891#comment-15121891
 ] 

Ryan Blue commented on AVRO-457:


Sounds reasonable to me. I don't know what an IDREF is either. What do you 
think the output should be instead?

> add tools that read/write xml records from/to avro data files
> -
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.8
>Reporter: Doug Cutting
>  Labels: gsoc
> Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, 
> AVRO-457.patch, ebucore.json
>
>
> It might be useful to have command-line tools that can read & write arbitrary 
> XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString

2016-01-27 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119898#comment-15119898
 ] 

Ryan Blue commented on AVRO-1786:
-

[~java8964], when you get a chance, can you post the counters from your test 
jobs? Results from the successful run (with the filter) would be helpful as 
well.

Any information you have on the values that cause this problem would be great. 
This looks suspiciously like a schema evolution change that was inconsistently 
applied. Did you change your schema lately?

> Strange IndexOutofBoundException in GenericDatumReader.readString
> -
>
> Key: AVRO-1786
> URL: https://issues.apache.org/jira/browse/AVRO-1786
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.4, 1.7.7
> Environment: CentOS 6.5 Linux x64, 2.6.32-358.14.1.el6.x86_64
> Use IBM JVM:
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20140515_199835 (JIT enabled, AOT enabled)
>Reporter: Yong Zhang
>
> Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running 
> IBM BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no 
> yarn), and comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 
> 1.7.0 Linux amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT 
> enabled). Not sure if the JDK matters, but it is NOT Oracle JVM.
> We have a ETL implemented in a chain of MR jobs. In one MR job, it is going 
> to merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is 
> in HDFS location B, and both contains the AVRO records binding to the same 
> AVRO schema. The record contains an unique id field, and a timestamp field. 
> The MR job is to merge the records based on the ID, and use the later 
> timestamp record to replace previous timestamp record, and omit the final 
> AVRO record out. Very straightforward.
> Now we faced a problem that one reducer keeps failing with the following 
> stacktrace on JobTracker:
> {code}
> java.lang.IndexOutOfBoundsException
>   at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
>   at java.io.DataInputStream.read(DataInputStream.java:160)
>   at 
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
>   at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
>   at 
> org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
>   at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:366)
>   at javax.security.auth.Subject.doAs(Subject.java:572)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> Here is the my Mapper and Reducer methods:
> Mapper:
> public void map(AvroKey key, NullWritable value, Context 
> context) throws IOException, InterruptedException 
> Reducer:
> protected void reduce(CustomPartitionKeyClass key, 
> Iterable values, Context context) throws 
> IOException, InterruptedException 
> What bother me are the following facts:
> 1) All the mappers finish without error
> 2) Most of the reducers finish without error, but one reducer keeps failing 
> 

[jira] [Updated] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString

2016-01-25 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1786:

Description: 
Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running IBM 
BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no yarn), and 
comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 1.7.0 Linux 
amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT enabled). Not 
sure if the JDK matters, but it is NOT Oracle JVM.

We have a ETL implemented in a chain of MR jobs. In one MR job, it is going to 
merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is in 
HDFS location B, and both contains the AVRO records binding to the same AVRO 
schema. The record contains an unique id field, and a timestamp field. The MR 
job is to merge the records based on the ID, and use the later timestamp record 
to replace previous timestamp record, and omit the final AVRO record out. Very 
straightforward.

Now we faced a problem that one reducer keeps failing with the following 
stacktrace on JobTracker:

{code}
java.lang.IndexOutOfBoundsException
at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
at java.io.DataInputStream.read(DataInputStream.java:160)
at 
org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
at 
org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
at 
org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
at 
org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
at 
org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
at 
org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at 
java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
{code}

Here is the my Mapper and Reducer methods:
Mapper:
public void map(AvroKey key, NullWritable value, Context 
context) throws IOException, InterruptedException 
Reducer:
protected void reduce(CustomPartitionKeyClass key, 
Iterable values, Context context) throws 
IOException, InterruptedException 

What bother me are the following facts:
1) All the mappers finish without error
2) Most of the reducers finish without error, but one reducer keeps failing 
with the above error.
3) It looks like caused by the data? But keep in mind that all the avro records 
passed the mapper side, but failed in one reducer. 
4) From the stacktrace, it looks like our reducer code was NOT invoked yet, but 
failed before that. So my guess is that all the AVRO records pass through the 
mapper side, but AVRO complains the intermediate result generated by the one 
mapper? In my understanding, that will be a Sequence file generated by Hadoop, 
and value part will be the AVRO bytes. Is the above error meaning that AVRO 
cannot deserialize the value part from the sequence file?
5) Our ETL run fine for more than one year, but suddenly got this error 
starting from one day, and kept getting this problem after that. 
6) If it helps, here is the schema for the avro record:

{code}
{
"namespace" : "company name",
"type" : "record",
"name" : "Lists",
"fields" : [
{"name" : "account_id", "type" : "long"},
{"name" : "list_id", "type" : "string"},
{"name" : "sequence_id", "type" : ["int", "null"]} ,

[jira] [Commented] (AVRO-1786) Strange IndexOutofBoundException in GenericDatumReader.readString

2016-01-25 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116032#comment-15116032
 ] 

Ryan Blue commented on AVRO-1786:
-

Hi [~java8964]. From that stack trace, it looks like the problem is when the 
reducer is reading data written out by the mapper. That's why you can read the 
source data just fine, the problem is in the job's intermediate data. As far as 
why this happens on just one reducer, can you post the job counters? This could 
be explained by there only being one output key or just one reduce task.

This sort of error usually happens when an Avro file is corrupt, or when 
another binary format is read as Avro. Since this fails before calling your 
reducer, it looks like none of the data values are readable so I think it may 
be a configuration problem between your mapper and reducer. Could you post the 
code for where you set up this job?

> Strange IndexOutofBoundException in GenericDatumReader.readString
> -
>
> Key: AVRO-1786
> URL: https://issues.apache.org/jira/browse/AVRO-1786
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.4, 1.7.7
> Environment: CentOS 6.5 Linux x64, 2.6.32-358.14.1.el6.x86_64
> Use IBM JVM:
> IBM J9 VM (build 2.7, JRE 1.7.0 Linux amd64-64 Compressed References 
> 20140515_199835 (JIT enabled, AOT enabled)
>Reporter: Yong Zhang
>
> Our production cluster is CENTOS 6.5 (2.6.32-358.14.1.el6.x86_64), running 
> IBM BigInsight V3.0.0.2. In Apache term, it is Hadoop 2.2.0 with MRV1(no 
> yarn), and comes with AVRO 1.7.4, running with IBM J9 VM (build 2.7, JRE 
> 1.7.0 Linux amd64-64 Compressed References 20140515_199835 (JIT enabled, AOT 
> enabled). Not sure if the JDK matters, but it is NOT Oracle JVM.
> We have a ETL implemented in a chain of MR jobs. In one MR job, it is going 
> to merge 2 sets of AVRO data. Dataset1 is in HDFS location A, and Dataset2 is 
> in HDFS location B, and both contains the AVRO records binding to the same 
> AVRO schema. The record contains an unique id field, and a timestamp field. 
> The MR job is to merge the records based on the ID, and use the later 
> timestamp record to replace previous timestamp record, and omit the final 
> AVRO record out. Very straightforward.
> Now we faced a problem that one reducer keeps failing with the following 
> stacktrace on JobTracker:
> {code}
> java.lang.IndexOutOfBoundsException
>   at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:191)
>   at java.io.DataInputStream.read(DataInputStream.java:160)
>   at 
> org.apache.avro.io.DirectBinaryDecoder.doReadBytes(DirectBinaryDecoder.java:184)
>   at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
>   at 
> org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
>   at 
> org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:143)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:125)
>   at 
> org.apache.avro.reflect.ReflectDatumReader.readString(ReflectDatumReader.java:121)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
>   at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:108)
>   at 
> org.apache.avro.hadoop.io.AvroDeserializer.deserialize(AvroDeserializer.java:48)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:142)
>   at 
> org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:117)
>   at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:297)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:165)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:652)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:366)
>   at javax.security.auth.Subject.doAs(Subject.java:572)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> Here is the my Mapper and Reducer methods:
> Mapper:
> public void map(AvroKey key, NullWritable value, Context 

[jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files

2016-01-22 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112759#comment-15112759
 ] 

Ryan Blue commented on AVRO-457:


Is that generic allowed in your original XSD or is that introduced when you 
convert to JAXB objects? If it is the latter, then I think we would have to get 
around that with a direct conversion to avoid losing what the type contained in 
that list is.

> add tools that read/write xml records from/to avro data files
> -
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.8
>Reporter: Doug Cutting
>  Labels: gsoc
> Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, 
> AVRO-457.patch, ebucore.json
>
>
> It might be useful to have command-line tools that can read & write arbitrary 
> XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1775) Running unit tests on Ruby 2.2

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107483#comment-15107483
 ] 

Ryan Blue commented on AVRO-1775:
-

Does the unit-test gem exist for 1.8? I don't think we've deprecated support 
for 1.8 yet.

> Running unit tests on Ruby 2.2
> --
>
> Key: AVRO-1775
> URL: https://issues.apache.org/jira/browse/AVRO-1775
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Reporter: Martin Kleppmann
> Attachments: AVRO-1775-1.patch
>
>
> Ruby 2.2 [removed the test/unit framework from the standard 
> library|https://bugs.ruby-lang.org/issues/9711#note-12]. As the Avro Ruby 
> implementation uses it for its tests, we need to add a dependency on the 
> {{test-unit}} gem in order to run the tests in Ruby 2.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1785) Ruby: schema_normalization.rb is incompatible with Ruby 1.8.7

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107628#comment-15107628
 ] 

Ryan Blue commented on AVRO-1785:
-

It's easy enough to fix the syntax errors, but then Rake's test runner fails 
with different errors depending on the version of Rake I try. I used the oldest 
version of rake supported by echoe, but that fails with another compatibility 
problem.

Next, I tried to avoid rake by running tests with {{ruby -Itest -Ilib 
test/test_schema_normalization.rb}}. Tests then fail with 2 general problems: 
the order of map keys doesn't match the test so schema strings aren't equal 
(IIRC, later versions of ruby always return insertion order) and primitive 
types aren't handled correctly because the case statement uses a splat for 
primitive types.

My take-away is that [we should let 1.8.7 
go|https://www.ruby-lang.org/en/news/2013/06/30/we-retire-1-8-7/]. Thoughts?

> Ruby: schema_normalization.rb is incompatible with Ruby 1.8.7
> -
>
> Key: AVRO-1785
> URL: https://issues.apache.org/jira/browse/AVRO-1785
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.8.0
>Reporter: Ryan Blue
>
> I was just checking AVRO-1775 in 1.8.7 and ran into compile errors. The 
> schema_normalization.rb code that was introduced by AVRO-1694 is not 
> compatible with Ruby 1.8.7 because it uses the "new" hash syntax in method 
> definitions.
> {code}
> blue@work:~/workspace/avro/lang/ruby$ bundle exec rake test
> /home/blue/workspace/avro/lang/ruby/Rakefile:19: warning: already initialized 
> constant VERSION
> /home/blue/.rvm/rubies/ruby-1.8.7-p374/bin/ruby -I"lib:ext:bin:test" 
> -I"/home/blue/.rvm/gems/ruby-1.8.7-p374/gems/rake-10.4.2/lib" 
> "/home/blue/.rvm/gems/ruby-1.8.7-p374/gems/rake-10.4.2/lib/rake/rake_test_loader.rb"
>  "test/test_help.rb" "test/test_socket_transport.rb" 
> "test/test_fingerprints.rb" "test/test_schema_normalization.rb" 
> "test/test_schema.rb" "test/test_datafile.rb" "test/test_io.rb" 
> "test/test_protocol.rb" 
> ./lib/avro/schema_normalization.rb:67: warning: else without rescue is useless
> ./lib/avro.rb:42:in `require': ./lib/avro/schema_normalization.rb:50: syntax 
> error, unexpected ':', expecting ')' (SyntaxError)
> normalize_named_type(schema, fields: fields)
> ^
> ./lib/avro/schema_normalization.rb:52: syntax error, unexpected ':', 
> expecting ')'
> normalize_named_type(schema, symbols: schema.symbols)
>  ^
> ./lib/avro/schema_normalization.rb:52: syntax error, unexpected ')', 
> expecting kEND
> ./lib/avro/schema_normalization.rb:54: syntax error, unexpected ':', 
> expecting ')'
> normalize_named_type(schema, size: schema.size)
>   ^
> ./lib/avro/schema_normalization.rb:54: syntax error, unexpected ')', 
> expecting kEND
> ./lib/avro/schema_normalization.rb:56: odd number list for Hash
> { type: type, items: normalize_schema(schema.items) }
>^
> ./lib/avro/schema_normalization.rb:56: syntax error, unexpected ':', 
> expecting '}'
> { type: type, items: normalize_schema(schema.items) }
>^
> ./lib/avro/schema_normalization.rb:56: syntax error, unexpected ':', 
> expecting '='
> { type: type, items: normalize_schema(schema.items) }
> ^
> ./lib/avro/schema_normalization.rb:56: syntax error, unexpected '}', 
> expecting kEND
> ./lib/avro/schema_normalization.rb:58: odd number list for Hash
> { type: type, values: normalize_schema(schema.values) }
>^
> ./lib/avro/schema_normalization.rb:58: syntax error, unexpected ':', 
> expecting '}'
> { type: type, values: normalize_schema(schema.values) }
>^
> ./lib/avro/schema_normalization.rb:58: syntax error, unexpected ':', 
> expecting '='
> { type: type, values: normalize_schema(schema.values) }
>  ^
> ./lib/avro/schema_normalization.rb:58: syntax error, unexpected '}', 
> expecting kEND
> ./lib/avro/schema_normalization.rb:72: odd number list for Hash
> name: field.name,
>  ^
> ./lib/avro/schema_normalization.rb:72: syntax error, unexpected ':', 
> expecting '}'
> name: field.name,
>  ^
> ./lib/avro/schema_normalization.rb:73: syntax error, unexpected ':', 
> expecting '='
> type: normalize_schema(field.type)
>  ^
> ./lib/avro/schema_normalization.rb:74: syntax error, unexpected '}', 
> expecting kEND
> ./lib/avro/schema_normalization.rb:80: odd number list for Hash
>   { name: name, type: schema.type_sym.to_s }.merge(attributes)
>  ^
> ./lib/avro/schema_normalization.rb:80: syntax error, 

[jira] [Commented] (AVRO-1775) Running unit tests on Ruby 2.2

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107568#comment-15107568
 ] 

Ryan Blue commented on AVRO-1775:
-

+1

1.8.7 installs the test-unit jar without a problem and runs tests. Test's don't 
pass, but that's another issue.

> Running unit tests on Ruby 2.2
> --
>
> Key: AVRO-1775
> URL: https://issues.apache.org/jira/browse/AVRO-1775
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Reporter: Martin Kleppmann
> Attachments: AVRO-1775-1.patch
>
>
> Ruby 2.2 [removed the test/unit framework from the standard 
> library|https://bugs.ruby-lang.org/issues/9711#note-12]. As the Avro Ruby 
> implementation uses it for its tests, we need to add a dependency on the 
> {{test-unit}} gem in order to run the tests in Ruby 2.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1775) Running unit tests on Ruby 2.2

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107636#comment-15107636
 ] 

Ryan Blue commented on AVRO-1775:
-

That's basically the same conclusion I came to after a brief try to fix it. I'm 
+1 for releasing as-is and moving past 1.8.7. (See AVRO-1785)

> Running unit tests on Ruby 2.2
> --
>
> Key: AVRO-1775
> URL: https://issues.apache.org/jira/browse/AVRO-1775
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Reporter: Martin Kleppmann
> Fix For: 1.8.0
>
> Attachments: AVRO-1775-1.patch
>
>
> Ruby 2.2 [removed the test/unit framework from the standard 
> library|https://bugs.ruby-lang.org/issues/9711#note-12]. As the Avro Ruby 
> implementation uses it for its tests, we need to add a dependency on the 
> {{test-unit}} gem in order to run the tests in Ruby 2.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1658) Add avroDoc on reflect

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107181#comment-15107181
 ] 

Ryan Blue commented on AVRO-1658:
-

[~aeroevan], thanks for picking this up! The patch looks like a good start and 
I'm happy to see there's a test in there. It looks like it only supports doc 
strings on fields, but Records, Enums, and Fixed can all have doc strings as 
well. What do you think about updating your patch to allow AvroDoc to be used 
with Java classes and enums?

> Add avroDoc on reflect
> --
>
> Key: AVRO-1658
> URL: https://issues.apache.org/jira/browse/AVRO-1658
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Zhaonan Sun
>  Labels: reflection
> Attachments: 
> 0001-AVRO-1658-Java-Add-reflection-annotation-AvroDoc.patch
>
>
> Looks like @AvroMeta can't add reserved fields, like @AvroMeta("doc", "some 
> doc") will have exceptions.
> I would be greate if we have a @AvroDoc("some documentations") in 
> org.apache.avro.reflect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-457) add tools that read/write xml records from/to avro data files

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107745#comment-15107745
 ] 

Ryan Blue commented on AVRO-457:


[~beligum], thanks for posting your summary of the current state of this. I 
agree with Michael's assessment that it isn't a lack of interest in having 
something like this, it is that we're not XSD experts either. That said, if we 
can get the right people together to collaborate around this, like 
[~rpimike1022] and the Stealth.ly team that put together option #1, then I can 
take care of the commit part. I don't think we all have to be experts if 
there's a portion of the community that is interested in looking at this, 
updating Michael's latest work, and helping us review to get it in.

> add tools that read/write xml records from/to avro data files
> -
>
> Key: AVRO-457
> URL: https://issues.apache.org/jira/browse/AVRO-457
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.7.8
>Reporter: Doug Cutting
>  Labels: gsoc
> Attachments: AVRO-457.patch, AVRO-457.patch, AVRO-457.patch, 
> AVRO-457.patch
>
>
> It might be useful to have command-line tools that can read & write arbitrary 
> XML data from & to Avro data files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1559) Drop support for Ruby 1.8

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107698#comment-15107698
 ] 

Ryan Blue commented on AVRO-1559:
-

We just hit more problems with ruby 1.8 support in AVRO-1785. I just checked 
HBase and I don't think it depends on the gem after all, and Sean agreed. I 
think we can move forward with this.

> Drop support for Ruby 1.8
> -
>
> Key: AVRO-1559
> URL: https://issues.apache.org/jira/browse/AVRO-1559
> Project: Avro
>  Issue Type: Wish
>Affects Versions: 1.7.7
>Reporter: Willem van Bergen
>Assignee: Willem van Bergen
> Fix For: 1.8.0
>
> Attachments: AVRO-1559.patch
>
>
> - Ruby 1.8 is EOL, and is even security issues aren't addressed anymore. 
> - It is also getting hard to set up Ruby 1.8 to run the tests (e.g. on a 
> recent OSX, it won't compile without manual fiddling).
> - Handling character encodings in Ruby 1.9 is very different than Ruby 1.8. 
> Supporting both at the same time adds a lot of overhead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1781) Schema.parse is not thread safe

2016-01-19 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1781:

Attachment: AVRO-1781-ADDENDUM.1.patch

I'm attaching a patch that remotes the cache entirely since it is no longer 
needed.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Ryan Blue
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781-ADDENDUM.1.patch, AVRO-1781.1.patch, 
> AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (AVRO-1781) Schema.parse is not thread safe

2016-01-19 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue reopened AVRO-1781:
-

I'm reopening this because after looking into it more, I don't think we need 
the cache at all. It was originally used to avoid parsing the logical type from 
a Schema's properties several times, but is now only used by the Schema parse 
method to do this once. The LogicalType instance is then set on the Schema and 
is available through Schema#getLogicalType. That's a much better way to keep 
track of them.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Ryan Blue
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781.1.patch, AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107965#comment-15107965
 ] 

Ryan Blue commented on AVRO-1781:
-

Thanks for having a look, Sean!

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Ryan Blue
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781-ADDENDUM.1.patch, AVRO-1781-ADDENDUM.2.patch, 
> AVRO-1781.1.patch, AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107916#comment-15107916
 ] 

Ryan Blue commented on AVRO-1781:
-

No, the previous patch was still needed since we are moving to guava for other 
caches. AVRO-1760 is based on the other changes in that patch. It just happens 
that this cache isn't needed.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Ryan Blue
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781-ADDENDUM.1.patch, AVRO-1781.1.patch, 
> AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1778) IPC/RPC for JavaScript

2016-01-19 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107974#comment-15107974
 ] 

Ryan Blue commented on AVRO-1778:
-

Sorry I'm a bit late with this review, [~mtth]. I'm trying to get a 1.8.0 
release out ahead of it. Thanks for your patience!

> IPC/RPC for JavaScript
> --
>
> Key: AVRO-1778
> URL: https://issues.apache.org/jira/browse/AVRO-1778
> Project: Avro
>  Issue Type: Improvement
>  Components: javascript
>Reporter: Matthieu Monsch
>Assignee: Ryan Blue
> Attachments: AVRO-1778.patch
>
>
> This patch adds protocols to the JavaScript implementation.
> The API was designed to:
> + Be simple and idiomatic. The `Protocol` class added here is heavily 
> inspired by node.js' core `EventEmitter` to keep things as familiar as 
> possible [1]. Getting a client and server working is straightforward and 
> requires very few lines of code [2].
> + Support arbitrary transports, both stateful and stateless. Built-in node.js 
> streams are supported out of the box (e.g. TCP/UNIX sockets, or even 
> stdin/stdout). Exchanging messages over a custom transport requires 
> implementing a single simple function (see [3] for an example).
> + Work both server-side and in the browser!
> Ps: I also tested against both the Java and Python implementations over HTTP 
> and communication worked. 
> [1] https://github.com/mtth/avsc/wiki/API#ipc--rpc
> [2] https://github.com/mtth/avsc/wiki/Advanced-usage#remote-procedure-calls
> [3] https://github.com/mtth/avsc/wiki/Advanced-usage#transient-streams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1781) Schema.parse is not thread safe

2016-01-15 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1781.
-
Resolution: Fixed
  Assignee: Ryan Blue

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Ryan Blue
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781.1.patch, AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1760) Thread scalability problem with the use of SynchronizedMap

2016-01-15 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102715#comment-15102715
 ] 

Ryan Blue commented on AVRO-1760:
-

+1

I was initially concerned about not allowing null values in the weak-keys 
identity hashmap, but the JSON blob must always be non-null and NullNode is 
checked explicitly. I'm confident that the values are never null.

> Thread scalability problem with the use of SynchronizedMap
> --
>
> Key: AVRO-1760
> URL: https://issues.apache.org/jira/browse/AVRO-1760
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0, 1.8.1, 1.9.0
>Reporter: Mulugeta Mammo
>Assignee: Tom White
>Priority: Critical
>  Labels: patch, performance
> Fix For: 1.8.0
>
> Attachments: AVRO-1760.patch, AVRO-1760.patch, AVRO-1760.patch
>
>
> While running Adam Genomics (which uses Avro) on Apache Spark, we discovered 
> that threads (tasks in Spark Context) block in Avro while executing the 
> getDefaultValue(Field field) method in 
> https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-13 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096719#comment-15096719
 ] 

Ryan Blue commented on AVRO-1781:
-

Thanks, Tom! I've added the comment and committed this.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781.1.patch, AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1773) Infinite loop caused by race condition

2016-01-13 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved AVRO-1773.
-
Resolution: Fixed

The patch for AVRO-1781 was just committed so I'm closing this.

> Infinite loop caused by race condition
> --
>
> Key: AVRO-1773
> URL: https://issues.apache.org/jira/browse/AVRO-1773
> Project: Avro
>  Issue Type: Sub-task
>  Components: java
>Affects Versions: 1.7.7
>Reporter: vincent ye
>Priority: Critical
> Fix For: 1.7.8, 1.8.0
>
>
> org.apache.avro.LogicalTypes#fromSchemaIgnoreInvalid lookups and update  
> CACHE. CACHE is backed by hashmap without synchronization. In a multithreaded 
> environment, it causes a infinite loop in hashmap lookup. The race condition 
> problem is described in the following blog 
> http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html.
> I experience this infinite loop in Spark jobs with high concurrency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1783) Gracefully handle strings with wrong character encoding

2016-01-13 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1783:

Attachment: AVRO-1783.stack.text

I'm attaching a file with the full stack trace that I produced by running 
{{require 'avro'}} in irb. The Ruby exception is a LoadError, but the cause is 
the missing method.

> Gracefully handle strings with wrong character encoding
> ---
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
> Attachments: AVRO-1783.patch, AVRO-1783.stack.text
>
>
> In the [vote thread for Avro 
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
>  [~busbey] noticed that [phunt's 
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
>write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
>  write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
>   (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart 
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, 
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and 
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a 
> binary-encoded string. {{Schema.validate}} checks that the string is suitable 
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, 
> although the MD5 digest of the schema is a 16-byte string, if you interpret 
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some 
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is 
> an opportunity to fix Avro's handling of strings to make it robust against 
> unexpected encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1783) Gracefully handle strings with wrong character encoding

2016-01-13 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096744#comment-15096744
 ] 

Ryan Blue commented on AVRO-1783:
-

[~martinkl], don't worry about the trace I just posted. I just tried to verify 
AVRO-1782 with my jruby 1.7.3 install and I can't even run bundle install. 
Looks like either RVM didn't install it right or something else is messed up 
with my OpenSSL.

> Gracefully handle strings with wrong character encoding
> ---
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
> Attachments: AVRO-1783.patch, AVRO-1783.stack.text
>
>
> In the [vote thread for Avro 
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
>  [~busbey] noticed that [phunt's 
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
>write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
>  write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
>   (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart 
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, 
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and 
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a 
> binary-encoded string. {{Schema.validate}} checks that the string is suitable 
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, 
> although the MD5 digest of the schema is a 16-byte string, if you interpret 
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some 
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is 
> an opportunity to fix Avro's handling of strings to make it robust against 
> unexpected encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1782) Test failures in Ruby 2.1/2.2

2016-01-13 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096749#comment-15096749
 ] 

Ryan Blue commented on AVRO-1782:
-

+1. I verified that everything still works fine in 2.0.0 and jruby 1.7.6. 
Thanks, [~martinkl]!

> Test failures in Ruby 2.1/2.2
> -
>
> Key: AVRO-1782
> URL: https://issues.apache.org/jira/browse/AVRO-1782
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
> Attachments: AVRO-1782.patch
>
>
> When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I 
> get several test failures. The distinct errors are:
> {code}
> NameError: uninitialized constant Avro::SchemaNormalization::JSON
> avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form'
> {code}
> and
> {code}
> TestSchemaNormalization#test_shared_dataset:
> NameError: uninitialized constant CaseFinder::StringScanner
> /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in 
> `initialize'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094503#comment-15094503
 ] 

Ryan Blue commented on AVRO-1781:
-

The problem is that Guava's MapMaker maps don't allow Null. I've updated the 
patch to use Optional and tests are passing.

I also ran into trouble with the mapred module because the Guava dependency, 
despite being shaded, was overriding Hadoop's guava dependency. I've added a 
ban for all versions of Guava other than 11.0.2 and for avro-guava-dependencies 
(to make sure it doesn't leak Guava classes into the classpath). Unfortunately, 
the only way to avoid hitting the Guava ban is to use version 11.0.2. If I use 
19.0 and build/test from the mapred directory it correctly uses 11.0.2, but if 
I build from the lang/java directory the dependencies are all resolved at once 
and 19.0 overrides the transitive dependency's version. Luckily, everything 
works with 11.0.2 and the jar is even a little smaller.

I also had to update this to exclude Google's JSR301 jar, which might be GPL 
and is banned by avro-tools. I'm attaching a new patch.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781.1.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1781) Schema.parse is not thread safe

2016-01-12 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue updated AVRO-1781:

Attachment: AVRO-1781.2.patch

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1781.1.patch, AVRO-1781.2.patch
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1773) Infinite loop caused by race condition

2016-01-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094547#comment-15094547
 ] 

Ryan Blue commented on AVRO-1773:
-

I'm marking this a duplicate of AVRO-1781 since we've been having more 
discussion over there (even though this was filed first).

> Infinite loop caused by race condition
> --
>
> Key: AVRO-1773
> URL: https://issues.apache.org/jira/browse/AVRO-1773
> Project: Avro
>  Issue Type: Sub-task
>  Components: java
>Affects Versions: 1.7.7
>Reporter: vincent ye
>Priority: Critical
> Fix For: 1.7.8, 1.8.0
>
>
> org.apache.avro.LogicalTypes#fromSchemaIgnoreInvalid lookups and update  
> CACHE. CACHE is backed by hashmap without synchronization. In a multithreaded 
> environment, it causes a infinite loop in hashmap lookup. The race condition 
> problem is described in the following blog 
> http://mailinator.blogspot.com/2009/06/beautiful-race-condition.html.
> I experience this infinite loop in Spark jobs with high concurrency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1783) Gracefully handle strings with wrong character encoding

2016-01-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15095272#comment-15095272
 ] 

Ryan Blue commented on AVRO-1783:
-

The patch looks good to me. I attempted to test it, but everything works in 
jruby 1.7.6 without the patch and with 1.7.3 I get a NoSuchMethodException 
before I can hit the problem. Can someone that has reproduced the bug verify 
that the patch fixes it? Then I'll commit this.

> Gracefully handle strings with wrong character encoding
> ---
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
> Attachments: AVRO-1783.patch
>
>
> In the [vote thread for Avro 
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
>  [~busbey] noticed that [phunt's 
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
>write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
>  write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
>   (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart 
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, 
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and 
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a 
> binary-encoded string. {{Schema.validate}} checks that the string is suitable 
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, 
> although the MD5 digest of the schema is a 16-byte string, if you interpret 
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some 
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is 
> an opportunity to fix Avro's handling of strings to make it robust against 
> unexpected encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1780) Avro tools jar fails with NPE

2016-01-12 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15094309#comment-15094309
 ] 

Ryan Blue commented on AVRO-1780:
-

Thanks for fixing this, Tom! I committed your patch.

> Avro tools jar fails with NPE
> -
>
> Key: AVRO-1780
> URL: https://issues.apache.org/jira/browse/AVRO-1780
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Assignee: Tom White
>Priority: Blocker
> Fix For: 1.8.0
>
> Attachments: AVRO-1780.patch
>
>
> following our license/notice updates, teh avro-tools jar fails with a NPE 
> because it wants to print out a NOTICE.txt in the root of the jar.
> {code}
> busbey$ java -jar avro-tools-1.8.0.jar
> Version 1.8.0 of Exception in thread "main" java.lang.NullPointerException
>   at org.apache.avro.tool.Main.printStream(Main.java:105)
>   at org.apache.avro.tool.Main.run(Main.java:92)
>   at org.apache.avro.tool.Main.main(Main.java:74)
> busbey$ java -jar avro-tools-1.8.0.jar --help
> Version 1.8.0 of Exception in thread "main" java.lang.NullPointerException
>   at org.apache.avro.tool.Main.printStream(Main.java:105)
>   at org.apache.avro.tool.Main.run(Main.java:92)
>   at org.apache.avro.tool.Main.main(Main.java:74)
> {code}
> We should probably not print the entire NOTICE unless a cli arg is given, 
> since it is much bigger now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   >