[jira] [Commented] (AVRO-1783) Gracefully handle strings with wrong character encoding

2016-01-11 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092834#comment-15092834
 ] 

Sean Busbey commented on AVRO-1783:
---

good digging! also, I should update my jruby version. ;)

> Gracefully handle strings with wrong character encoding
> ---
>
> Key: AVRO-1783
> URL: https://issues.apache.org/jira/browse/AVRO-1783
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
>
> In the [vote thread for Avro 
> 1.8.0-rc2|http://mail-archives.apache.org/mod_mbox/avro-dev/201601.mbox/%3CCAGHyZ6K-oe35%2BOYROK6MSwrHxfPHvjmqhJAfRJL2dzexYw6YSw%40mail.gmail.com%3E],
>  [~busbey] noticed that [phunt's 
> avro-rpc-quickstart|https://github.com/phunt/avro-rpc-quickstart] fails:
> {code}
> busbey$ ruby sample_ipc_client.rb avro_user pat Hello_World
> Avro::IO::AvroTypeError: The datum
> "\x89\xA9\xD1\xFF@NUm\xEA\x9A\xFB\xDAx\xF5Zq"
> is not an example of schema
> {"type":"fixed","name":"MD5","namespace":"org.apache.avro.ipc","size":16}
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:543
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:610
> each at org/jruby/RubyArray.java:1613
> write_record at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:609
>   write_data at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:561
>write at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/io.rb:538
>  write_handshake_request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:136
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:105
>  request at
> /Users/busbey/.rvm/gems/jruby-1.7.3/gems/avro-1.8.0/lib/avro/ipc.rb:117
>   (root) at sample_ipc_client.rb:49
> {code}
> I tried reproducing the error, and it is quite strange. avro-rpc-quickstart 
> works fine for me in Ruby (MRI) 2.2 and 2.1, and in JRuby 1.7.23. However, 
> [~busbey] was using JRuby 1.7.3 (as visible from the path names above), and 
> in this particular version of JRuby I was able to reproduce the issue.
> It seems that in some circumstances (but not always, bizarrely), JRuby 1.7.3 
> returns a UTF-8 encoded string from {{Digest::MD5.digest}}, rather than a 
> binary-encoded string. {{Schema.validate}} checks that the string is suitable 
> for writing as datum for a {{fixed}} type by calling {{#size}}. In this case, 
> although the MD5 digest of the schema is a 16-byte string, if you interpret 
> it as a UTF-8 encoded string, it consists of only 13 characters (i.e. some 
> sequences are interpreted as multibyte characters).
> Rather than trying to divine why JRuby is being weird here, I think this is 
> an opportunity to fix Avro's handling of strings to make it robust against 
> unexpected encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-739) Add Date/Time data types

2016-01-11 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092737#comment-15092737
 ] 

Ryan Blue commented on AVRO-739:


[~ctaggart], see AVRO-1684 and [PR #37|https://github.com/apache/avro/pull/37]. 
That adds date, time_millis, and timestamp_millis to specific in Java.

> Add Date/Time data types
> 
>
> Key: AVRO-739
> URL: https://issues.apache.org/jira/browse/AVRO-739
> Project: Avro
>  Issue Type: New Feature
>  Components: spec
>Reporter: Jeff Hammerbacher
>Assignee: Dmitry Kovalev
> Fix For: 1.8.0
>
> Attachments: AVRO-739-datetime-spec.xml.patch, 
> AVRO-739-datetime-spec.xml.patch, AVRO-739-update-spec.diff, AVRO-739.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1782) Test failures in Ruby 2.1/2.2

2016-01-11 Thread Martin Kleppmann (JIRA)
Martin Kleppmann created AVRO-1782:
--

 Summary: Test failures in Ruby 2.1/2.2
 Key: AVRO-1782
 URL: https://issues.apache.org/jira/browse/AVRO-1782
 Project: Avro
  Issue Type: Bug
  Components: ruby
Reporter: Martin Kleppmann


When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I 
get several test failures. The distinct errors are:

{code}
NameError: uninitialized constant Avro::SchemaNormalization::JSON
avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form'
{code}

and

{code}
TestSchemaNormalization#test_shared_dataset:
NameError: uninitialized constant CaseFinder::StringScanner
/Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in 
`initialize'
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1782) Test failures in Ruby 2.1/2.2

2016-01-11 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1782:
---
Attachment: AVRO-1782.patch

Attached a patch to fix all the test failures:

* Explicitly require strscan to get StringScanner (seems like something else 
implicitly required it in prior versions, so we got away without requiring it 
ourselves)
* Change SchemaNormalization to use MultiJson instead of JSON, bringing it in 
line with Avro::Schema.

> Test failures in Ruby 2.1/2.2
> -
>
> Key: AVRO-1782
> URL: https://issues.apache.org/jira/browse/AVRO-1782
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Reporter: Martin Kleppmann
> Attachments: AVRO-1782.patch
>
>
> When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I 
> get several test failures. The distinct errors are:
> {code}
> NameError: uninitialized constant Avro::SchemaNormalization::JSON
> avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form'
> {code}
> and
> {code}
> TestSchemaNormalization#test_shared_dataset:
> NameError: uninitialized constant CaseFinder::StringScanner
> /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in 
> `initialize'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names

2016-01-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092723#comment-15092723
 ] 

Hudson commented on AVRO-1725:
--

SUCCESS: Integrated in AvroJava #565 (See 
[https://builds.apache.org/job/AvroJava/565/])
AVRO-1725. Docs: clarify restrictions on enum symbols. (martinkl: rev 1724129)
* trunk/CHANGES.txt
* trunk/doc/src/content/xdocs/spec.xml


> Enum schema exhibits same restriction to enum symbols as to names
> -
>
> Key: AVRO-1725
> URL: https://issues.apache.org/jira/browse/AVRO-1725
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Nikita Makeev
> Fix For: 1.8.0
>
> Attachments: AVRO-1725.patch
>
>
> EnumSchema class in org.apache.avro.Schema has the following code:
> for (String symbol : symbols)
> if (ordinals.put(validateName(symbol), i++) != null)
> which validates enum symbols using validateName() which makes impossible to 
> use symbols that are not conforming to standard for real names. 
> That prohibits using of symbols like "" (empty string) or anything starting 
> with number which does not seem to be intended.
> I guess this place requires either some another type of validation or no 
> validation at all. Can provide a patch for both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1782) Test failures in Ruby 2.1/2.2

2016-01-11 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1782:
---
Affects Version/s: 1.7.7
   Status: Patch Available  (was: Open)

> Test failures in Ruby 2.1/2.2
> -
>
> Key: AVRO-1782
> URL: https://issues.apache.org/jira/browse/AVRO-1782
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.7.7
>Reporter: Martin Kleppmann
> Attachments: AVRO-1782.patch
>
>
> When running the Avro Ruby implementation's test suite in Ruby 2.1 or 2.2, I 
> get several test failures. The distinct errors are:
> {code}
> NameError: uninitialized constant Avro::SchemaNormalization::JSON
> avro/lang/ruby/lib/avro/schema_normalization.rb:28:in `to_parsing_form'
> {code}
> and
> {code}
> TestSchemaNormalization#test_shared_dataset:
> NameError: uninitialized constant CaseFinder::StringScanner
> /Users/martin/Applications/avro/lang/ruby/test/case_finder.rb:30:in 
> `initialize'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-739) Add Date/Time data types

2016-01-11 Thread Cameron Taggart (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092731#comment-15092731
 ] 

Cameron Taggart commented on AVRO-739:
--

Reviewing the docs 
(https://github.com/apache/avro/blob/branch-1.8/doc/src/content/xdocs/spec.xml#L1412-L1472),
 are `date`, `time-millis`, `time-micros`, `timestamp-millis`, 
`timesteamp-micros`, and `duration` going to be added to Avro IDL too 
(https://avro.apache.org/docs/current/idl.html)?

> Add Date/Time data types
> 
>
> Key: AVRO-739
> URL: https://issues.apache.org/jira/browse/AVRO-739
> Project: Avro
>  Issue Type: New Feature
>  Components: spec
>Reporter: Jeff Hammerbacher
>Assignee: Dmitry Kovalev
> Fix For: 1.8.0
>
> Attachments: AVRO-739-datetime-spec.xml.patch, 
> AVRO-739-datetime-spec.xml.patch, AVRO-739-update-spec.diff, AVRO-739.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names

2016-01-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092671#comment-15092671
 ] 

ASF subversion and git services commented on AVRO-1725:
---

Commit 1724129 from [~martinkl] in branch 'avro/trunk'
[ https://svn.apache.org/r1724129 ]

AVRO-1725. Docs: clarify restrictions on enum symbols.

> Enum schema exhibits same restriction to enum symbols as to names
> -
>
> Key: AVRO-1725
> URL: https://issues.apache.org/jira/browse/AVRO-1725
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Nikita Makeev
> Fix For: 1.8.0
>
> Attachments: AVRO-1725.patch
>
>
> EnumSchema class in org.apache.avro.Schema has the following code:
> for (String symbol : symbols)
> if (ordinals.put(validateName(symbol), i++) != null)
> which validates enum symbols using validateName() which makes impossible to 
> use symbols that are not conforming to standard for real names. 
> That prohibits using of symbols like "" (empty string) or anything starting 
> with number which does not seem to be intended.
> I guess this place requires either some another type of validation or no 
> validation at all. Can provide a patch for both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names

2016-01-11 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1725:
---
   Resolution: Fixed
Fix Version/s: 1.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-1.8.

> Enum schema exhibits same restriction to enum symbols as to names
> -
>
> Key: AVRO-1725
> URL: https://issues.apache.org/jira/browse/AVRO-1725
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Nikita Makeev
> Fix For: 1.8.0
>
> Attachments: AVRO-1725.patch
>
>
> EnumSchema class in org.apache.avro.Schema has the following code:
> for (String symbol : symbols)
> if (ordinals.put(validateName(symbol), i++) != null)
> which validates enum symbols using validateName() which makes impossible to 
> use symbols that are not conforming to standard for real names. 
> That prohibits using of symbols like "" (empty string) or anything starting 
> with number which does not seem to be intended.
> I guess this place requires either some another type of validation or no 
> validation at all. Can provide a patch for both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-11 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092283#comment-15092283
 ] 

Sean Busbey commented on AVRO-1781:
---

2 main approaches I see:

* externalize the cache and store one per Schema.Parser instance, so you get 
caching if you reuse and the app can decide on parallelism trade-offs
* switch to using a R/W lock based cache so that it is threadsafe.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1781) Schema.

2016-01-11 Thread Sean Busbey (JIRA)
Sean Busbey created AVRO-1781:
-

 Summary: Schema.
 Key: AVRO-1781
 URL: https://issues.apache.org/jira/browse/AVRO-1781
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.8.0
Reporter: Sean Busbey
Priority: Blocker
 Fix For: 1.8.0


Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} on 
any schema that is expressed as a JSON object (anything except bare primitives).

That static method relies on a static cache based on WeakIdentityHashMap (WIHM).

WIHM clearly states that it isn't threadsafe 
[ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]

{code}
 * 
 * Note that this implementation is not synchronized.
 * 
 */
public class WeakIdentityHashMap implements Map {
{code}

All of the Schema.Parser instances use that same static Schema.parse method.

The end result is that as-is it's only safe to have a single thread parsing 
schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1781) Schema.parse is not thread safe

2016-01-11 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated AVRO-1781:
--
Summary: Schema.parse is not thread safe  (was: Schema.)

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1779) Avro docs convenience artifact missing LICENSE/NOTICE

2016-01-11 Thread Sean Busbey (JIRA)
Sean Busbey created AVRO-1779:
-

 Summary: Avro docs convenience artifact missing LICENSE/NOTICE
 Key: AVRO-1779
 URL: https://issues.apache.org/jira/browse/AVRO-1779
 Project: Avro
  Issue Type: Bug
  Components: doc
Affects Versions: 1.8.0
Reporter: Sean Busbey
Priority: Blocker
 Fix For: 1.8.0


for releases we generate a convenience artifact with our docs. at present, this 
tarball is missing our needed LICENSE/NOTICE files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1725) Enum schema exhibits same restriction to enum symbols as to names

2016-01-11 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092431#comment-15092431
 ] 

Ryan Blue commented on AVRO-1725:
-

+1 for that patch, [~martinkl]. Thanks!

> Enum schema exhibits same restriction to enum symbols as to names
> -
>
> Key: AVRO-1725
> URL: https://issues.apache.org/jira/browse/AVRO-1725
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Nikita Makeev
> Attachments: AVRO-1725.patch
>
>
> EnumSchema class in org.apache.avro.Schema has the following code:
> for (String symbol : symbols)
> if (ordinals.put(validateName(symbol), i++) != null)
> which validates enum symbols using validateName() which makes impossible to 
> use symbols that are not conforming to standard for real names. 
> That prohibits using of symbols like "" (empty string) or anything starting 
> with number which does not seem to be intended.
> I guess this place requires either some another type of validation or no 
> validation at all. Can provide a patch for both cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1781) Schema.parse is not thread safe

2016-01-11 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092454#comment-15092454
 ] 

Ryan Blue commented on AVRO-1781:
-

I think this is the same as AVRO-1773. Another option is to use a better weak 
hash map from guava that supports concurrency. That's the solution recommended 
for AVRO-1760 as well.

I have reservations about adding a dependency on Guava since the suggestion is 
currently to shade it, but it would avoid many issues like this one.

> Schema.parse is not thread safe
> ---
>
> Key: AVRO-1781
> URL: https://issues.apache.org/jira/browse/AVRO-1781
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Sean Busbey
>Priority: Blocker
> Fix For: 1.8.0
>
>
> Post AVRO-1497, Schema.parse calls {{LogicalTypes.fromSchemaIgnoreInvalid}} 
> on any schema that is expressed as a JSON object (anything except bare 
> primitives).
> That static method relies on a static cache based on WeakIdentityHashMap 
> (WIHM).
> WIHM clearly states that it isn't threadsafe 
> [ref|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/util/WeakIdentityHashMap.java#L42]
> {code}
>  * 
>  * Note that this implementation is not synchronized.
>  * 
>  */
> public class WeakIdentityHashMap implements Map {
> {code}
> All of the Schema.Parser instances use that same static Schema.parse method.
> The end result is that as-is it's only safe to have a single thread parsing 
> schemas in a given JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)