[jira] [Commented] (AVRO-1783) Gracefully handle strings with wrong character encoding
[ https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092834#comment-15092834 ] Sean Busbey commented on AVRO-1783: --- good digging! also, I should update my jruby version. ;) > Gracefully handle strings with wrong character encoding > --- > > Key: AVRO-1783 > URL: https://issues.apache.org/jira/browse/AVRO-1783 > Project: Avro > Issue Type: Bug > Components: ruby >Affects Versions: 1.7.7 >Reporter: Martin Kleppmann > > In the [vote thread for Avro > 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E], > [~busbey] noticed that [phunt's > avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails: > {code} > busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World > Avro::IO::AvroTypeError: The datum > "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq" > is not an example of schema > {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16} > write_data at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543 > write_record at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610 > each at org/jruby/RubyArray.java:1613 > write_record at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609 > write_data at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561 >write at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538 > write_handshake_request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136 > request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105 > request at > /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117 > (root) at sample_ipc_client.rb:49 > {code} > I tried reproducing the error, and it is quite strange. avro-rpc-quickstart > works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, > [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and > in this particular version of JRuby I was able to reproduce the issue. > It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 > returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a > binary-encoded string. {{Schema.validate}} checks that the string is suitable > for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, > although the MD5 digest of the schema is a 16-byte string, if you interpret > it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some > sequences are interpreted as multibyte characters). > Rather than trying to divine why JRuby is being weird here, I think this is > an opportunity to fix Avro's handling of strings to make it robust against > unexpected encodings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-739) Add Date/Time data types
[ https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092737#comment-15092737 ] Ryan Blue commented on AVRO-739: [~ctaggart], see AVRO-1684 and [PR #37|https://github.com/apache/avro/pull/37]. That adds date, time_millis, and timestamp_millis to specific in Java. > Add Date/Time data types > > > Key: AVRO-739 > URL: https://issues.apache.org/jira/browse/AVRO-739 > Project: Avro > Issue Type: New Feature > Components: spec >Reporter: Jeff Hammerbacher >Assignee: Dmitry Kovalev > Fix For: 1.8.0 > > Attachments: AVRO-739-datetime-spec.xml.patch, > AVRO-739-datetime-spec.xml.patch, AVRO-739-update-spec.diff, AVRO-739.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AVRO-1782) Test failures in Ruby 2.1/2.2
Martin Kleppmann created AVRO-1782: -- Summary: Test failures in Ruby 2.1/2.2 Key: AVRO-1782 URL: https://issues.apache.org/jira/browse/AVRO-1782 Project: Avro Issue Type: Bug Components: ruby Reporter: Martin Kleppmann When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I get several test failures. The distinct errors are: {code} NameError: uninitialized constant Avro::SchemaNormalization::JSON avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form' {code} and {code} TestSchemaNormalization#test_shared_dataset: NameError: uninitialized constant CaseFinder::StringScanner /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in `initialize' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AVRO-1782) Test failures in Ruby 2.1/2.2
[ https://issues.apache.org/jira/browse/AVRO-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Kleppmann updated AVRO-1782: --- Attachment: AVRO-1782.patch Attached a patch to fix all the test failures: * Explicitly require strscan to get StringScanner (seems like something else implicitly required it in prior versions, so we got away without requiring it ourselves) * Change SchemaNormalization to use MultiJson instead of JSON, bringing it in line with Avro::Schema. > Test failures in Ruby 2.1/2.2 > - > > Key: AVRO-1782 > URL: https://issues.apache.org/jira/browse/AVRO-1782 > Project: Avro > Issue Type: Bug > Components: ruby >Reporter: Martin Kleppmann > Attachments: AVRO-1782.patch > > > When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I > get several test failures. The distinct errors are: > {code} > NameError: uninitialized constant Avro::SchemaNormalization::JSON > avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form' > {code} > and > {code} > TestSchemaNormalization#test_shared_dataset: > NameError: uninitialized constant CaseFinder::StringScanner > /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in > `initialize' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names
[ https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092723#comment-15092723 ] Hudson commented on AVRO-1725: -- SUCCESS: Integrated in AvroJava #565 (See [https://builds.apache.org/job/AvroJava/565/]) AVRO-1725. Docs: clarify restrictions on enum symbols. (martinkl: rev 1724129) * trunk/CHANGES.txt * trunk/doc/src/content/xdocs/spec.xml > Enum schema exhibits same restriction to enum symbols as to names > - > > Key: AVRO-1725 > URL: https://issues.apache.org/jira/browse/AVRO-1725 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Nikita Makeev > Fix For: 1.8.0 > > Attachments: AVRO-1725.patch > > > EnumSchema class in org.apache.avro.Schema has the following code: > for (String symbol : symbols) > if (ordinals.put(validateName(symbol), i++) != null) > which validates enum symbols using validateName() which makes impossible to > use symbols that are not conforming to standard for real names. > That prohibits using of symbols like "" (empty string) or anything starting > with number which does not seem to be intended. > I guess this place requires either some another type of validation or no > validation at all. Can provide a patch for both cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AVRO-1782) Test failures in Ruby 2.1/2.2
[ https://issues.apache.org/jira/browse/AVRO-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Kleppmann updated AVRO-1782: --- Affects Version/s: 1.7.7 Status: Patch Available (was: Open) > Test failures in Ruby 2.1/2.2 > - > > Key: AVRO-1782 > URL: https://issues.apache.org/jira/browse/AVRO-1782 > Project: Avro > Issue Type: Bug > Components: ruby >Affects Versions: 1.7.7 >Reporter: Martin Kleppmann > Attachments: AVRO-1782.patch > > > When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I > get several test failures. The distinct errors are: > {code} > NameError: uninitialized constant Avro::SchemaNormalization::JSON > avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form' > {code} > and > {code} > TestSchemaNormalization#test_shared_dataset: > NameError: uninitialized constant CaseFinder::StringScanner > /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in > `initialize' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-739) Add Date/Time data types
[ https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092731#comment-15092731 ] Cameron Taggart commented on AVRO-739: -- Reviewing the docs (https://github.com/apache/avro/blob/branch-1.8/doc/src/content/xdocs/spec.xml#L1412-L1472), are `date`, `time-millis`, `time-micros`, `timestamp-millis`, `timesteamp-micros`, and `duration` going to be added to Avro IDL too (https://avro.apache.org/docs/current/idl.html)? > Add Date/Time data types > > > Key: AVRO-739 > URL: https://issues.apache.org/jira/browse/AVRO-739 > Project: Avro > Issue Type: New Feature > Components: spec >Reporter: Jeff Hammerbacher >Assignee: Dmitry Kovalev > Fix For: 1.8.0 > > Attachments: AVRO-739-datetime-spec.xml.patch, > AVRO-739-datetime-spec.xml.patch, AVRO-739-update-spec.diff, AVRO-739.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names
[ https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092671#comment-15092671 ] ASF subversion and git services commented on AVRO-1725: --- Commit 1724129 from [~martinkl] in branch 'avro/trunk' [ https://svn.apache.org/r1724129 ] AVRO-1725. Docs: clarify restrictions on enum symbols. > Enum schema exhibits same restriction to enum symbols as to names > - > > Key: AVRO-1725 > URL: https://issues.apache.org/jira/browse/AVRO-1725 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Nikita Makeev > Fix For: 1.8.0 > > Attachments: AVRO-1725.patch > > > EnumSchema class in org.apache.avro.Schema has the following code: > for (String symbol : symbols) > if (ordinals.put(validateName(symbol), i++) != null) > which validates enum symbols using validateName() which makes impossible to > use symbols that are not conforming to standard for real names. > That prohibits using of symbols like "" (empty string) or anything starting > with number which does not seem to be intended. > I guess this place requires either some another type of validation or no > validation at all. Can provide a patch for both cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names
[ https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Kleppmann updated AVRO-1725: --- Resolution: Fixed Fix Version/s: 1.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-1.8. > Enum schema exhibits same restriction to enum symbols as to names > - > > Key: AVRO-1725 > URL: https://issues.apache.org/jira/browse/AVRO-1725 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Nikita Makeev > Fix For: 1.8.0 > > Attachments: AVRO-1725.patch > > > EnumSchema class in org.apache.avro.Schema has the following code: > for (String symbol : symbols) > if (ordinals.put(validateName(symbol), i++) != null) > which validates enum symbols using validateName() which makes impossible to > use symbols that are not conforming to standard for real names. > That prohibits using of symbols like "" (empty string) or anything starting > with number which does not seem to be intended. > I guess this place requires either some another type of validation or no > validation at all. Can provide a patch for both cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe
[ https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092283#comment-15092283 ] Sean Busbey commented on AVRO-1781: --- 2 main approaches I see: * externalize the cache and store one per Schema.Parser instance, so you get caching if you reuse and the app can decide on parallelism trade-offs * switch to using a R/W lock based cache so that it is threadsafe. > Schema.parse is not thread safe > --- > > Key: AVRO-1781 > URL: https://issues.apache.org/jira/browse/AVRO-1781 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.0 >Reporter: Sean Busbey >Priority: Blocker > Fix For: 1.8.0 > > > Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} > on any schema that is expressed as a JSON object (anything except bare > primitives). > That static method relies on a static cache based on WeakIdentityHashMap > (WIHM). > WIHM clearly states that it isn't threadsafe > [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42] > {code} > * > * Note that this implementation is not synchronized. > * > */ > public class WeakIdentityHashMapimplements Map { > {code} > All of the Schema.Parser instances use that same static Schema.parse method. > The end result is that as-is it's only safe to have a single thread parsing > schemas in a given JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AVRO-1781) Schema.
Sean Busbey created AVRO-1781: - Summary: Schema. Key: AVRO-1781 URL: https://issues.apache.org/jira/browse/AVRO-1781 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.8.0 Reporter: Sean Busbey Priority: Blocker Fix For: 1.8.0 Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} on any schema that is expressed as a JSON object (anything except bare primitives). That static method relies on a static cache based on WeakIdentityHashMap (WIHM). WIHM clearly states that it isn't threadsafe [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42] {code} * * Note that this implementation is not synchronized. * */ public class WeakIdentityHashMapimplements Map { {code} All of the Schema.Parser instances use that same static Schema.parse method. The end result is that as-is it's only safe to have a single thread parsing schemas in a given JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AVRO-1781) Schema.parse is not thread safe
[ https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated AVRO-1781: -- Summary: Schema.parse is not thread safe (was: Schema.) > Schema.parse is not thread safe > --- > > Key: AVRO-1781 > URL: https://issues.apache.org/jira/browse/AVRO-1781 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.0 >Reporter: Sean Busbey >Priority: Blocker > Fix For: 1.8.0 > > > Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} > on any schema that is expressed as a JSON object (anything except bare > primitives). > That static method relies on a static cache based on WeakIdentityHashMap > (WIHM). > WIHM clearly states that it isn't threadsafe > [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42] > {code} > * > * Note that this implementation is not synchronized. > * > */ > public class WeakIdentityHashMapimplements Map { > {code} > All of the Schema.Parser instances use that same static Schema.parse method. > The end result is that as-is it's only safe to have a single thread parsing > schemas in a given JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AVRO-1779) Avro docs convenience artifact missing LICENSE/NOTICE
Sean Busbey created AVRO-1779: - Summary: Avro docs convenience artifact missing LICENSE/NOTICE Key: AVRO-1779 URL: https://issues.apache.org/jira/browse/AVRO-1779 Project: Avro Issue Type: Bug Components: doc Affects Versions: 1.8.0 Reporter: Sean Busbey Priority: Blocker Fix For: 1.8.0 for releases we generate a convenience artifact with our docs. at present, this tarball is missing our needed LICENSE/NOTICE files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names
[ https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092431#comment-15092431 ] Ryan Blue commented on AVRO-1725: - +1 for that patch, [~martinkl]. Thanks! > Enum schema exhibits same restriction to enum symbols as to names > - > > Key: AVRO-1725 > URL: https://issues.apache.org/jira/browse/AVRO-1725 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 >Reporter: Nikita Makeev > Attachments: AVRO-1725.patch > > > EnumSchema class in org.apache.avro.Schema has the following code: > for (String symbol : symbols) > if (ordinals.put(validateName(symbol), i++) != null) > which validates enum symbols using validateName() which makes impossible to > use symbols that are not conforming to standard for real names. > That prohibits using of symbols like "" (empty string) or anything starting > with number which does not seem to be intended. > I guess this place requires either some another type of validation or no > validation at all. Can provide a patch for both cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe
[ https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092454#comment-15092454 ] Ryan Blue commented on AVRO-1781: - I think this is the same as AVRO-1773. Another option is to use a better weak hash map from guava that supports concurrency. That's the solution recommended for AVRO-1760 as well. I have reservations about adding a dependency on Guava since the suggestion is currently to shade it, but it would avoid many issues like this one. > Schema.parse is not thread safe > --- > > Key: AVRO-1781 > URL: https://issues.apache.org/jira/browse/AVRO-1781 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.8.0 >Reporter: Sean Busbey >Priority: Blocker > Fix For: 1.8.0 > > > Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} > on any schema that is expressed as a JSON object (anything except bare > primitives). > That static method relies on a static cache based on WeakIdentityHashMap > (WIHM). > WIHM clearly states that it isn't threadsafe > [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42] > {code} > * > * Note that this implementation is not synchronized. > * > */ > public class WeakIdentityHashMapimplements Map { > {code} > All of the Schema.Parser instances use that same static Schema.parse method. > The end result is that as-is it's only safe to have a single thread parsing > schemas in a given JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)