[jira] [Commented] (AVRO-1533) permit promotions between string and bytes

2014-07-02 Thread Martin Kleppmann (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049741#comment-14049741
 ] 

Martin Kleppmann commented on AVRO-1533:


Only just saw this, sorry for the delay. I am wondering what this means for 
validation of schema compatibility, e.g. AVRO-1315. If a bytes field is 
changed to string, and thus runtime errors are possible due to invalid UTF-8, 
should the schemas be considered compatible? If people are going to understand 
validation succeeded as guaranteed no runtime errors, this is potentially 
an issue.

 permit promotions between string and bytes
 --

 Key: AVRO-1533
 URL: https://issues.apache.org/jira/browse/AVRO-1533
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.7

 Attachments: AVRO-1533.patch, AVRO-1533.patch


 Avro strings are a subset of bytes, so promoting from string to bytes is 
 lossless and should be possible.  Promotion from bytes to strings may cause 
 problems, as not all byte strings are valid UTF8, but it also might be useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (AVRO-1536) Remove monkeypatching of Enumerable

2014-07-02 Thread Martin Kleppmann (JIRA)
Martin Kleppmann created AVRO-1536:
--

 Summary: Remove monkeypatching of Enumerable
 Key: AVRO-1536
 URL: https://issues.apache.org/jira/browse/AVRO-1536
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Affects Versions: 1.7.6
Reporter: Martin Kleppmann
 Fix For: 1.7.7


The Avro Ruby gem adds a method {{collect_hash}} to the core module 
{{Enumerable}}. It's bad form for a library to extend core modules like this, 
and it's also unnecessary (stdlib methods can do the job perfectly well).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AVRO-1536) Remove monkeypatching of Enumerable

2014-07-02 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1536:
---

Attachment: AVRO-1536.patch

Attached a patch, extracted from [~wvanbergen]'s patch for AVRO-1499. Willem, 
could you please check whether this looks right?

 Remove monkeypatching of Enumerable
 ---

 Key: AVRO-1536
 URL: https://issues.apache.org/jira/browse/AVRO-1536
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Affects Versions: 1.7.6
Reporter: Martin Kleppmann
 Fix For: 1.7.7

 Attachments: AVRO-1536.patch


 The Avro Ruby gem adds a method {{collect_hash}} to the core module 
 {{Enumerable}}. It's bad form for a library to extend core modules like this, 
 and it's also unnecessary (stdlib methods can do the job perfectly well).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AVRO-1499) Ruby 2+ Writes Invalid avro files using the avro gem

2014-07-02 Thread Martin Kleppmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Kleppmann updated AVRO-1499:
---

Attachment: AVRO-1499-3.patch

Thanks for your patch, [~wvanbergen]. I've attached a new patch which merges 
our changes together:

- Removing the monkeypatching of Enumerable is a good idea, but it's a separate 
issue, so I've split it out into AVRO-1536.
- Given that we've just removed a monkeypatch on a core module, I'm not so keen 
to add a new one for String#bytesize. I've changed it to check for the presence 
of String#bytesize at the point where it's needed.
- I've retained my change to set the encoding of the buffer to BINARY. That by 
itself is actually sufficient, because String#size returns the number of bytes 
if the encoding is set to binary. However, I've also kept it using #bytesize to 
make clear what's going on.

I've tested it in a broad range of Ruby versions. I'll commit this soon unless 
there are objections.

 Ruby 2+ Writes Invalid avro files using the avro gem
 

 Key: AVRO-1499
 URL: https://issues.apache.org/jira/browse/AVRO-1499
 Project: Avro
  Issue Type: Bug
  Components: ruby
Affects Versions: 1.7.5
Reporter: Michael Ries
Assignee: Martin Kleppmann
  Labels: ruby
 Fix For: 1.7.7

 Attachments: AVRO-1499-2.patch, AVRO-1499-3.patch, AVRO-1499.patch


 The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It 
 appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
 Here is a reproducible:
 ```ruby
 require 'avro'
  
 data = [
   {guid=144045de-eb44-dd1b-d9af-6c8b5d41a96e, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617818, updated_at=1398180288, deleted_at=nil},
   {guid=51e06057-14d2-7527-81fa-b07dba0a263b, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=Student Loans 
 R' Us, created_at=1386178342, updated_at=1398180286, 
 deleted_at=nil},
   {guid=b4d1d99f-4351-d0e7-221c-a3fae08716bc, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617026, updated_at=1398180288, deleted_at=nil},
   {guid=084638fa-a78d-bbdd-e075-7c9c957a9b46, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617138, updated_at=1398180288, deleted_at=nil},
   {guid=79287c76-4e8f-0a21-7569-a2bcdc2b2f4d, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617135, updated_at=1398180288, deleted_at=nil},
   {guid=3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=Cayman Islands 
 Bank, created_at=1386902345, updated_at=1398180288, deleted_at=nil},
   {guid=75e1e56c-7611-4030-d002-afa2af70e5a1, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617427, updated_at=1398180288, deleted_at=nil},
 ]
  
 member_schema = -SCHEMA
 {namespace: md.data_logs,
  type: record,
  name: Member,
  fields: [
  {name: guid, type: string},
  {name: user_guid, type: string},
  {name: name, type: [string,null]},
  {name: created_at, type:long},
  {name: updated_at, type:long},
  {name: deleted_at, type:[long,null]}
  ]
 }
 SCHEMA
 filepath = ./members.avro
 File.unlink(filepath) if File.exists?(filepath)
  
 Avro::DataFile.open(filepath, w, member_schema) do |dw|
   data.each do |entry|
 dw  entry
   end
 end
  
  
 entries = []
 Avro::DataFile.open(filepath, r) do |reader|
   reader.each do |entry|
 entries  entry
   end
 end
  
 puts Here is the data I wrote into the file:
 data.each{|e| p e }
 print \n\n\n\n
  
 puts Here is the data I read from the file:
 entries.each{|e| p e }
 ```
 Under ruby 2+ it fails with the message undefined method 'unpack' for 
 nil:NilClass (NoMethodError). I have also tested that the rubygem can 
 correctly read avro files written by the java client, but the java client 
 fails to read files written by the ruby client, so the issue is definitely in 
 how the rubygem is trying to write the binary file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Ruby gem fork - contribute back?

2014-07-02 Thread Sean Busbey
On Wed, Jul 2, 2014 at 4:36 AM, Martin Kleppmann mar...@kleppmann.com
wrote:



 FWIW, Ruby isn't the only language with a tricky setup. I spent ages
 trying to get the Avro tests for PHP to work, for example. As discussed on
 another thread [1], I think a Docker container might be a good way of
 building a baseline configuration on which everyone can easily test changes
 and make release candidates. Any help with this would be most welcome.


Martin,

Is there already a ticket to track this part of the effort?

-Sean


[jira] [Commented] (AVRO-1516) Unit test failure in Ruby 2.0 and above

2014-07-02 Thread Willem van Bergen (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050045#comment-14050045
 ] 

Willem van Bergen commented on AVRO-1516:
-

The problem is `union[double, int]` types. Integers get encoded as doubles, 
because that's the first matching type. When reading it back, it will be read 
as a double instead of an int. In more recent version of Ruby, the equality 
operator will return false when comparing the int vs. the double.

The fix requires changing the `schema#validate` method, something like this: 
https://github.com/wvanbergen/tros/commit/e5941173c37553b417663ae0ef4e6b4d9265c65b

I can work on a patch when I get back from vacation, at the end of this month.

 Unit test failure in Ruby 2.0 and above
 ---

 Key: AVRO-1516
 URL: https://issues.apache.org/jira/browse/AVRO-1516
 Project: Avro
  Issue Type: Test
  Components: ruby
Affects Versions: 1.7.6
Reporter: Martin Kleppmann

 The following unit test fails when run with Ruby 2.0 and above:
 {noformat}
 $ bundle exec rake test
 /Users/mkleppma/.rubies/ruby-2.0.0-p195/bin/ruby -Ilib:ext:bin:test 
 -I/Users/mkleppma/.gem/ruby/2.0.0/gems/rake-10.3.1/lib 
 /Users/mkleppma/.gem/ruby/2.0.0/gems/rake-10.3.1/lib/rake/rake_test_loader.rb
  test/test_datafile.rb test/test_help.rb test/test_io.rb 
 test/test_protocol.rb test/test_schema.rb test/test_socket_transport.rb
 Run options:
 # Running tests:
 [30/41] TestIO#test_union = 0.00 s
   1) Failure:
 test_union(TestIO) 
 [/Users/mkleppma/Applications/avro/lang/ruby/test/test_io.rb:339]:
 -3372032630846393039 expected but was
 -3.372032630846393e+18.
 Finished tests in 0.346139s, 118.4495 tests/s, 2207.2058 assertions/s.
 41 tests, 764 assertions, 1 failures, 0 errors, 0 skips
 ruby -v: ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin12.3.0]
 rake aborted!
 Command failed with status (1): [ruby -Ilib:ext:bin:test 
 -I/Users/mkleppma/.gem/ruby/2.0.0/gems/rake-10.3.1/lib 
 /Users/mkleppma/.gem/ruby/2.0.0/gems/rake-10.3.1/lib/rake/rake_test_loader.rb
  test/test_datafile.rb test/test_help.rb test/test_io.rb 
 test/test_protocol.rb test/test_schema.rb test/test_socket_transport.rb 
 ]
 /Users/mkleppma/.gem/ruby/2.0.0/gems/echoe-4.6.5/lib/echoe.rb:749:in `block 
 in define_tasks'
 Tasks: TOP = test_inner
 (See full trace by running task with --trace)
 {noformat}
 Brief investigation suggests that this isn't a bug in Avro, but just a badly 
 written test. The test is comparing -3372032630846393039 and 
 -3372032630846393000.0, which Ruby 1.9 and below consider to be equal, but 
 Ruby 2.0 and above consider to be non-equal.
 Our tests shouldn't be relying on such edge cases of type coercion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-1499) Ruby 2+ Writes Invalid avro files using the avro gem

2014-07-02 Thread Willem van Bergen (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050073#comment-14050073
 ] 

Willem van Bergen commented on AVRO-1499:
-

W.r.t. monkey patching String. I agree with you in general, but in this case I 
think it is preferable over doing a `respond_to` in the write method.

- It basically backports a method in a way that is completely compatible with 
Ruby 1.9+, and it only does so if it's not available. 
- This makes it a lot easier to drop 1.8 support later. Just one file of 
backports to delete, instead of having to go through the entire source code to 
find occurrences.
- Performance: only one respond_to check when the library is loaded, instead of 
a check on every write.

I have no real strong feelings about it, so feel free to ignore this :)


 Ruby 2+ Writes Invalid avro files using the avro gem
 

 Key: AVRO-1499
 URL: https://issues.apache.org/jira/browse/AVRO-1499
 Project: Avro
  Issue Type: Bug
  Components: ruby
Affects Versions: 1.7.5
Reporter: Michael Ries
Assignee: Martin Kleppmann
  Labels: ruby
 Fix For: 1.7.7

 Attachments: AVRO-1499-2.patch, AVRO-1499-3.patch, AVRO-1499.patch


 The rubygem writes corrupted avro files under ruby 2.0.0 and ruby 2.1.1. It 
 appears to work correctly under jruby-1.7.10 and ruby 1.9.3.
 Here is a reproducible:
 ```ruby
 require 'avro'
  
 data = [
   {guid=144045de-eb44-dd1b-d9af-6c8b5d41a96e, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617818, updated_at=1398180288, deleted_at=nil},
   {guid=51e06057-14d2-7527-81fa-b07dba0a263b, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=Student Loans 
 R' Us, created_at=1386178342, updated_at=1398180286, 
 deleted_at=nil},
   {guid=b4d1d99f-4351-d0e7-221c-a3fae08716bc, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617026, updated_at=1398180288, deleted_at=nil},
   {guid=084638fa-a78d-bbdd-e075-7c9c957a9b46, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617138, updated_at=1398180288, deleted_at=nil},
   {guid=79287c76-4e8f-0a21-7569-a2bcdc2b2f4d, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617135, updated_at=1398180288, deleted_at=nil},
   {guid=3bcc26b2-7d3b-6c4d-cb27-4eb1574b3c20, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=Cayman Islands 
 Bank, created_at=1386902345, updated_at=1398180288, deleted_at=nil},
   {guid=75e1e56c-7611-4030-d002-afa2af70e5a1, 
 user_guid=0cd41235-5c14-eae9-00ed-c6eb11dd9119, name=My Awesome 
 Bank, created_at=1390617427, updated_at=1398180288, deleted_at=nil},
 ]
  
 member_schema = -SCHEMA
 {namespace: md.data_logs,
  type: record,
  name: Member,
  fields: [
  {name: guid, type: string},
  {name: user_guid, type: string},
  {name: name, type: [string,null]},
  {name: created_at, type:long},
  {name: updated_at, type:long},
  {name: deleted_at, type:[long,null]}
  ]
 }
 SCHEMA
 filepath = ./members.avro
 File.unlink(filepath) if File.exists?(filepath)
  
 Avro::DataFile.open(filepath, w, member_schema) do |dw|
   data.each do |entry|
 dw  entry
   end
 end
  
  
 entries = []
 Avro::DataFile.open(filepath, r) do |reader|
   reader.each do |entry|
 entries  entry
   end
 end
  
 puts Here is the data I wrote into the file:
 data.each{|e| p e }
 print \n\n\n\n
  
 puts Here is the data I read from the file:
 entries.each{|e| p e }
 ```
 Under ruby 2+ it fails with the message undefined method 'unpack' for 
 nil:NilClass (NoMethodError). I have also tested that the rubygem can 
 correctly read avro files written by the java client, but the java client 
 fails to read files written by the ruby client, so the issue is definitely in 
 how the rubygem is trying to write the binary file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-1536) Remove monkeypatching of Enumerable

2014-07-02 Thread Willem van Bergen (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050222#comment-14050222
 ] 

Willem van Bergen commented on AVRO-1536:
-

Looks good!

 Remove monkeypatching of Enumerable
 ---

 Key: AVRO-1536
 URL: https://issues.apache.org/jira/browse/AVRO-1536
 Project: Avro
  Issue Type: Improvement
  Components: ruby
Affects Versions: 1.7.6
Reporter: Martin Kleppmann
Assignee: Martin Kleppmann
 Fix For: 1.7.7

 Attachments: AVRO-1536.patch


 The Avro Ruby gem adds a method {{collect_hash}} to the core module 
 {{Enumerable}}. It's bad form for a library to extend core modules like this, 
 and it's also unnecessary (stdlib methods can do the job perfectly well).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Circular references and non-string map-keys patch

2014-07-02 Thread S G
Hi Avro Developers,

I have submitted a patch for Circular references and non-string map-keys
support in Avro at
https://issues.apache.org/jira/browse/AVRO-695

Can someone guide me how I can get it reviewed and commit this patch?

Thanks for helping me out,
Sachin


[jira] [Commented] (AVRO-1533) permit promotions between string and bytes

2014-07-02 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050739#comment-14050739
 ] 

Doug Cutting commented on AVRO-1533:


It won't generate runtime errors for invalid UTF-8, but instead replaces 
erroneous sequences with the character �:

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String(byte[],%20java.nio.charset.Charset)

I think can be considered a compatible change, since it won't break existing 
applications.  Today attempts to switch a field from bytes to string would 
fail.  I suppose an application could currently rely on such failures, but I 
consider that unlikely enough that I'm willing to ignore it.  Do others 
disagree?

We could:
 # revert this change entirely, declaring it incompatible
 # revert just the change to the specification, so that Avro Java is more 
lenient in what conversions it permits than the specification (following 
Postel's law)
 # file issues to update the AVRO-1315 schema validation to permit such 
conversions
 - also file issues for C, C++ and C# to update their schema resolution to 
support these conversions

Thoughts?

 permit promotions between string and bytes
 --

 Key: AVRO-1533
 URL: https://issues.apache.org/jira/browse/AVRO-1533
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.7

 Attachments: AVRO-1533.patch, AVRO-1533.patch


 Avro strings are a subset of bytes, so promoting from string to bytes is 
 lossless and should be possible.  Promotion from bytes to strings may cause 
 problems, as not all byte strings are valid UTF8, but it also might be useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-1533) permit promotions between string and bytes

2014-07-02 Thread graham sanderson (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050759#comment-14050759
 ] 

graham sanderson commented on AVRO-1533:


I had a quick look at the patch, and there are a few s.getBytes()  new 
String(byte[])... since the intention seems to be to assume that bytes are 
interchangeable with UTF-8 encoded strings, it should probably be explicit as 
it is in Utf8, no?

 permit promotions between string and bytes
 --

 Key: AVRO-1533
 URL: https://issues.apache.org/jira/browse/AVRO-1533
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 1.7.7

 Attachments: AVRO-1533.patch, AVRO-1533.patch


 Avro strings are a subset of bytes, so promoting from string to bytes is 
 lossless and should be possible.  Promotion from bytes to strings may cause 
 problems, as not all byte strings are valid UTF8, but it also might be useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-695) Cycle Reference Support

2014-07-02 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050789#comment-14050789
 ] 

Doug Cutting commented on AVRO-695:
---

For some reason I cannot apply the patch file you've generated, so it's hard 
for me to analyze it in detail.  What tool are you using to generate this patch?

I'd prefer we use an explicit type rather than overload string for this 
purpose.  A union with a record like:

{code}
{type:record,name:org.apache.avro.CircularRef, fields:[{name:ref, 
type:int}]}
{code}

We don't ever have to create or define such a record.  Rather, a CustomEncoding 
can be used to directly resolve such references at read time.  (If circular 
references are not enabled then a GenericRecord would be read.)  No string 
prefixing, etc. would then be required.

Rather than modifying ReflectData to support this, might we instead create a 
subclass of ReflectData that supports circular references?

Non-string map key support in reflection should be addressed in a separate 
issue.

 Cycle Reference Support
 ---

 Key: AVRO-695
 URL: https://issues.apache.org/jira/browse/AVRO-695
 Project: Avro
  Issue Type: New Feature
  Components: spec
Affects Versions: 1.7.6
Reporter: Moustapha Cherri
 Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz, 
 avro_circular_references.zip, avro_circular_refs_2014_06_14.zip, 
 circular_refs_and_nonstring_map_keys_2014_06_25.zip

   Original Estimate: 672h
  Remaining Estimate: 672h

 This is a proposed implementation to add cycle reference support to Avro. It 
 basically introduce a new type named Cycle. Cycles contains a string 
 representing the path to the other reference.
 For example if we have an object of type Message that have a member named 
 previous with type Message too. If we have have this hierarchy:
 message
   previous : message2
 message2
   previous : message2
 When serializing the cycle path for message2.previous will be previous.
 The implementation depend on ANTLR to evaluate those cycle at read time to 
 resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used 
 ANTLR to speed thing up. I kept in this implementation the generated code 
 from ANTLR though this should not be the case as this should be generated 
 during the build. I only updated the Java code.
 I did not make full unit testing but you can find avrotest.Main class that 
 can be used a preliminary test.
 Please do not hesitate to contact me for further clarification if this seems 
 interresting.
 Best regards,
 Moustapha Cherri



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AVRO-680) Allow for non-string keys

2014-07-02 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated AVRO-680:
--

Affects Version/s: 1.7.7
   1.7.6
   Status: Patch Available  (was: Open)

For a map with non-string keys in Java such as:
{code}
Map EmployeeId, EmployeeInfo
{code},

the map is converted to an array using:
*GenericDatumWriter.java*
{code}
map.entrySet()
{code}

Corresponding schema change is done in *ReflectData.java*

Diff created using _diff -ru_
Patch can be applied using _patch -i non_string_map_keys.patch_
Unit tests included.

 Allow for non-string keys
 -

 Key: AVRO-680
 URL: https://issues.apache.org/jira/browse/AVRO-680
 Project: Avro
  Issue Type: Improvement
Affects Versions: 1.7.6, 1.7.7
Reporter: Jeremy Hanna
 Attachments: non_string_map_keys.zip


 Based on an email thread back in April, Doug Cutting proposed a possible 
 solution for having non-string keys:
 Stu Hood wrote:
  I can understand the reasoning behind AVRO-9, but now I need to look for an 
  alternative to a 'map' that will allow me to store an association of bytes 
  keys to values.
 A map of Foo has the same binary format as an array of records, each
 with a string field and a Foo field.  So an application can use an array
 schema similar to this to represent map-like structures with, e.g.,
 non-string keys.
 Perhaps we could establish standard properties that indicate that a
 given array of records should be represented in a map-like way if
 possible?  E.g.,:
 {type: array, isMap: true, items: {type:record, ...}}
 Doug



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (AVRO-680) Allow for non-string keys

2014-07-02 Thread Sachin Goyal (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated AVRO-680:
--

Attachment: non_string_map_keys.zip

 Allow for non-string keys
 -

 Key: AVRO-680
 URL: https://issues.apache.org/jira/browse/AVRO-680
 Project: Avro
  Issue Type: Improvement
Affects Versions: 1.7.6, 1.7.7
Reporter: Jeremy Hanna
 Attachments: non_string_map_keys.zip


 Based on an email thread back in April, Doug Cutting proposed a possible 
 solution for having non-string keys:
 Stu Hood wrote:
  I can understand the reasoning behind AVRO-9, but now I need to look for an 
  alternative to a 'map' that will allow me to store an association of bytes 
  keys to values.
 A map of Foo has the same binary format as an array of records, each
 with a string field and a Foo field.  So an application can use an array
 schema similar to this to represent map-like structures with, e.g.,
 non-string keys.
 Perhaps we could establish standard properties that indicate that a
 given array of records should be represented in a map-like way if
 possible?  E.g.,:
 {type: array, isMap: true, items: {type:record, ...}}
 Doug



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (AVRO-695) Cycle Reference Support

2014-07-02 Thread Sachin Goyal (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051012#comment-14051012
 ] 

Sachin Goyal commented on AVRO-695:
---

Thanks Doug.

I created the patch using *diff -r* and it can be patched using *patch -i 
patch-file*

For non-string map-keys, I have submitted a separate patch at 
https://issues.apache.org/jira/browse/AVRO-680

Could you please explain your comment regarding circular references in more 
detail?
I will change my patch accordingly.

For example, here is a circular reference schema generated using Avro:
{code:javascript}
{
  type : record,
  name : SimpleParent,
  namespace : org.apache.avro.generic,
  fields : [ {
name : parentName,
type : [ null, string ],
default : null
  }, {
name : child,
type : [ null, {
  type : record,
  name : SimpleChild,
  fields : [ {
name : childName,
type : [ null, string ],
default : null
  }, {
name : parent,
type : [ null, SimpleParent],
default : null
  } ]
}, string ],
default : null
  } ]
}
{code}

The current code converts it to the following:
{code:javascript}
{
  type : record,
  name : SimpleParent,
  namespace : org.apache.avro.generic,
  fields : [ {
name : __crefId,
type : string
  }, {
name : parentName,
type : [ null, string ],
default : null
  }, {
name : child,
type : [ null, {
  type : record,
  name : SimpleChild,
  fields : [ {
name : __crefId,
type : string
  }, {
name : childName,
type : [ null, string ],
default : null
  }, {
name : parent,
type : [ null, SimpleParent, string ],
default : null
  } ],
  circularRefIdPrefix : __crefId
}, string ],
default : null
  } ],
  circularRefIdPrefix : __crefId
}
{code}

Can you please apply your comments above to this example?
It will help me in understanding how it would be different from the above 
solution.

As per my understanding, circular references can come in any record-type 
element.
So CustomeEncoder approach would need to write it as a record sometimes or a 
string/int sometimes.
Can a CustomEncoder pass the control to regular Avro Encoder to write a record 
normally?

 Cycle Reference Support
 ---

 Key: AVRO-695
 URL: https://issues.apache.org/jira/browse/AVRO-695
 Project: Avro
  Issue Type: New Feature
  Components: spec
Affects Versions: 1.7.6
Reporter: Moustapha Cherri
 Attachments: avro-1.4.1-cycle.patch.gz, avro-1.4.1-cycle.patch.gz, 
 avro_circular_references.zip, avro_circular_refs_2014_06_14.zip, 
 circular_refs_and_nonstring_map_keys_2014_06_25.zip

   Original Estimate: 672h
  Remaining Estimate: 672h

 This is a proposed implementation to add cycle reference support to Avro. It 
 basically introduce a new type named Cycle. Cycles contains a string 
 representing the path to the other reference.
 For example if we have an object of type Message that have a member named 
 previous with type Message too. If we have have this hierarchy:
 message
   previous : message2
 message2
   previous : message2
 When serializing the cycle path for message2.previous will be previous.
 The implementation depend on ANTLR to evaluate those cycle at read time to 
 resolve them. I used ANTLR 3.2. This dependency is not mandated; I just used 
 ANTLR to speed thing up. I kept in this implementation the generated code 
 from ANTLR though this should not be the case as this should be generated 
 during the build. I only updated the Java code.
 I did not make full unit testing but you can find avrotest.Main class that 
 can be used a preliminary test.
 Please do not hesitate to contact me for further clarification if this seems 
 interresting.
 Best regards,
 Moustapha Cherri



--
This message was sent by Atlassian JIRA
(v6.2#6252)