[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-07-03 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699249#comment-13699249
 ] 

Jeremy Kahn commented on AVRO-1318:
---

I prefer the approach you ([~cutting]) suggest -- especially the object aspects 
of it, and especially if the objects can be derived from 
{{collections.Sequence}} and {{collections.Mapping}} so that existing accessors 
can keep working the same way.

Unfortunately, I don't have any free cycles for this, though I'd be happy to 
contribute later in July. I don't know if this should block 1.7.5 release 
though.



 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-07-02 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698498#comment-13698498
 ] 

Jeremy Kahn commented on AVRO-1318:
---

[~laserson], glad you're game to contribute!

- what cleanup do you think is needed?
- would it be possible to use your work _without_ requiring a change to the 
read/write API? (could the old API be preserved in terms of your new one?)

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1343) Python: validate too permissive on records with extra fields

2013-07-01 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697022#comment-13697022
 ] 

Jeremy Kahn commented on AVRO-1343:
---

It causes problems for unioned data in Python, because Python moves to generic 
data and then introspects the data with {{validate}} to determine which union 
member to use to re-encode the data.

Suppose I start with a schema:
{code}{type: record, name: superset,
  fields: [ {name: foo, type: int },
  {name: bar, type: string} ] }
{code}
If I encode these two lines with a schema of _only_ {{superset}} objects:
{code}
  {foo: 99, bar: banana}
  {foo: -98, bar: peaches}{code}
the data is entirely recoverable.   But if I rewrite that datafile with a 
schema supporting a union of {{superset}} and {{subset}}
{code}
[{type: record, name: superset,
  fields: [ {name: foo, type: int },
  {name: bar, type: string} ] },
 {type: record, name: subset,
  fields: [ {name: foo, type: int } ] }
]{code}
the data will be re-encoded as {{subset}} objects, silently effectively 
discarding the {{bar}} field.

This behavior seems fundamentally backwards-breaking _as unpatched_, but here's 
a way we could rewrite it to only affect union member selection: I could 
rewrite the patch to pass an extra {{strict}} optional (default {{False}}) 
value to validate, and then to use that {{strict=True}} value when doing 
union-member-selection.  This would, I believe, allow extra fields for simple 
records, but discard them when determining the correct member. 

Of course, someone might still be expecting to put things into Python unions 
with extra fields and depending on the schema to discard these, but I think 
anyone with that expected behavior would have encountered this bug already.

 Python: validate too permissive on records with extra fields
 

 Key: AVRO-1343
 URL: https://issues.apache.org/jira/browse/AVRO-1343
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1343-tests.patch, AVRO-1343-validate.patch


 Python's validator silently accepts (generic) records with extra fields and 
 considers them valid.
 For example, {{io.validate}} silently considers that the schema:
 {noformat}{type: record,
  name: Test,
  fields: [{name: f, type: long}]}
 {noformat}
 should accept records like:
 {noformat}{'f': 5, 'extra_field': abc}{noformat}
 but this is problematic.
 This is *especially* problematic for encoding unions, because internally the 
 Python serializer uses {{validate}} to find the appropriate schema with which 
 to encode a given object.
 In the current implementation, union schema selection is the *last* schema 
 that {{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't 
 picky, this encoding will frequently guess wrong.
 I will attach two patches: one to the tests and one to the {{validate}} 
 function.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-07-01 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697032#comment-13697032
 ] 

Jeremy Kahn commented on AVRO-1318:
---

The purpose is roughly the same, if I understand correctly. This fingerprint 
notion is copied from [~laserson]'s [perf 
branch|https://github.com/laserson/avro/tree/perf] to avoid recomputation of 
evolution decisions (to to cache encoder and decoder objects, quoting the 
spec).

This delta does most of the parsing canonical form part of the spec, if I 
understand correctly, but should be reviewed in light of that, for sure.

I've found Uri's work on this useful to support Cython extensions, but adapting 
the Python decoder and encoder to cache those encoders and decoders is a pretty 
big change. I thought this one bit should be safe enough to include without 
requiring a 1.8.0 bump, so I pushed it forward as a proposal.


 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-07-01 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697076#comment-13697076
 ] 

Jeremy Kahn commented on AVRO-1318:
---

It is not an end-user use case. it's a useful performance win, if you're 
caching encoder and decoder objects, as Uri's changes do. I've written my own 
extensions based heavily on this signature behavior.

I'd be happy to have [~laserson]'s perf branch added. [~cutting], perhaps you 
can go chat with him about this? He's a Clouderian, if I understand correctly.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1343) Python: validate too permissive on records with extra fields

2013-05-31 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1343 started by Jeremy Kahn.

 Python: validate too permissive on records with extra fields
 

 Key: AVRO-1343
 URL: https://issues.apache.org/jira/browse/AVRO-1343
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5


 Python's validator silently accepts (generic) records with extra fields and 
 considers them valid.
 For example, {{io.validate}} silently considers that the schema:
 {noformat}{type: record,
  name: Test,
  fields: [{name: f, type: long}]}
 {noformat}
 should accept records like:
 {noformat}{'f': 5, 'extra_field': abc}{noformat}
 but this is problematic.
 This is *especially* problematic for encoding unions, because internally the 
 Python serializer uses {{validate}} to find the appropriate schema with which 
 to encode a given object.
 In the current implementation, union schema selection is the *last* schema 
 that {{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't 
 picky, this encoding will frequently guess wrong.
 I will attach two patches: one to the tests and one to the {{validate}} 
 function.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1343) Python: validate too permissive on records with extra fields

2013-05-31 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1343:
-

 Summary: Python: validate too permissive on records with extra 
fields
 Key: AVRO-1343
 URL: https://issues.apache.org/jira/browse/AVRO-1343
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5


Python's validator silently accepts (generic) records with extra fields and 
considers them valid.

For example, {{io.validate}} silently considers that the schema:
{noformat}{type: record,
 name: Test,
 fields: [{name: f, type: long}]}
{noformat}
should accept records like:
{noformat}{'f': 5, 'extra_field': abc}{noformat}
but this is problematic.

This is *especially* problematic for encoding unions, because internally the 
Python serializer uses {{validate}} to find the appropriate schema with which 
to encode a given object.

In the current implementation, union schema selection is the *last* schema that 
{{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't picky, 
this encoding will frequently guess wrong.

I will attach two patches: one to the tests and one to the {{validate}} 
function.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1343) Python: validate too permissive on records with extra fields

2013-05-31 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1343:
--

Attachment: AVRO-1343-validate.patch
AVRO-1343-tests.patch

 Python: validate too permissive on records with extra fields
 

 Key: AVRO-1343
 URL: https://issues.apache.org/jira/browse/AVRO-1343
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1343-tests.patch, AVRO-1343-validate.patch


 Python's validator silently accepts (generic) records with extra fields and 
 considers them valid.
 For example, {{io.validate}} silently considers that the schema:
 {noformat}{type: record,
  name: Test,
  fields: [{name: f, type: long}]}
 {noformat}
 should accept records like:
 {noformat}{'f': 5, 'extra_field': abc}{noformat}
 but this is problematic.
 This is *especially* problematic for encoding unions, because internally the 
 Python serializer uses {{validate}} to find the appropriate schema with which 
 to encode a given object.
 In the current implementation, union schema selection is the *last* schema 
 that {{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't 
 picky, this encoding will frequently guess wrong.
 I will attach two patches: one to the tests and one to the {{validate}} 
 function.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1343) Python: validate too permissive on records with extra fields

2013-05-31 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1343:
--

Status: Patch Available  (was: In Progress)

I hope these patches can be accepted into 1.7.5.

 Python: validate too permissive on records with extra fields
 

 Key: AVRO-1343
 URL: https://issues.apache.org/jira/browse/AVRO-1343
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1343-tests.patch, AVRO-1343-validate.patch


 Python's validator silently accepts (generic) records with extra fields and 
 considers them valid.
 For example, {{io.validate}} silently considers that the schema:
 {noformat}{type: record,
  name: Test,
  fields: [{name: f, type: long}]}
 {noformat}
 should accept records like:
 {noformat}{'f': 5, 'extra_field': abc}{noformat}
 but this is problematic.
 This is *especially* problematic for encoding unions, because internally the 
 Python serializer uses {{validate}} to find the appropriate schema with which 
 to encode a given object.
 In the current implementation, union schema selection is the *last* schema 
 that {{validate(schema, obj)}} returns {{True}} for.  If {{validate}} isn't 
 picky, this encoding will frequently guess wrong.
 I will attach two patches: one to the tests and one to the {{validate}} 
 function.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1331) Java reader backwards-compatibility breakage

2013-05-15 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1331:
-

 Summary: Java reader backwards-compatibility breakage
 Key: AVRO-1331
 URL: https://issues.apache.org/jira/browse/AVRO-1331
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.5
Reporter: Jeremy Kahn
 Attachments: stripped-snipped.avro, stripped-snipped.schema

For some cases where we encode Avro data with Avro 1.7.4, it is not readable 
with Avro 1.7.5-SNAPSHOT post AVRO-1295: the Java decoder is unable to discover 
the root definitions

Among the properties of (some) schemas that trigger this failure:

- an explicit empty string in the root namespace and
- uses other namespaces elsewhere in the schema, 
- has a recursive reference to the root

A sample schema and a sample datafile with one example encoded with that schema 
are attached.  

This datafile cannot be read with Java deserializers (and I believe that the 
schema cannot be parsed by the Java schema parser).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1331) Java reader backwards-compatibility breakage

2013-05-15 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1331:
--

Attachment: stripped-snipped.schema
stripped-snipped.avro

the sample stripped and snipped files trigger this misbehavior with versions 
of 1.7.5-SNAPSHOT that include patch AVRO-1295.

 Java reader backwards-compatibility breakage
 

 Key: AVRO-1331
 URL: https://issues.apache.org/jira/browse/AVRO-1331
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.5
Reporter: Jeremy Kahn
 Attachments: stripped-snipped.avro, stripped-snipped.schema


 For some cases where we encode Avro data with Avro 1.7.4, it is not readable 
 with Avro 1.7.5-SNAPSHOT post AVRO-1295: the Java decoder is unable to 
 discover the root definitions
 Among the properties of (some) schemas that trigger this failure:
 - an explicit empty string in the root namespace and
 - uses other namespaces elsewhere in the schema, 
 - has a recursive reference to the root
 A sample schema and a sample datafile with one example encoded with that 
 schema are attached.  
 This datafile cannot be read with Java deserializers (and I believe that the 
 schema cannot be parsed by the Java schema parser).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-05-08 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652062#comment-13652062
 ] 

Jeremy Kahn commented on AVRO-1318:
---

Nudging this issue to ask for review from a Pythonista and/or a committer. It'd 
be great if AVRO-1318 and AVRO-1323 could be included in Avro 1.7.5 release.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1318) Python schema should store fingerprints

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1318 started by Jeremy Kahn.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features

 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1318) Python schema should store fingerprints

2013-05-06 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1318:
-

 Summary: Python schema should store fingerprints
 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor


Python schema objects need to produce a simple representation that demonstrates 
their field identity.   {avro.schema.Schema} objects need to provide a 
{fingerprint} member field to enable quick checking of schema matching (even 
when the schema has other, possibly changed decoration).

Based on a patch pulled from [~laserson]'s proposed changes to make a 
collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1323) Python request schemas should report fullname

2013-05-06 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1323:
-

 Summary: Python request schemas should report fullname
 Key: AVRO-1323
 URL: https://issues.apache.org/jira/browse/AVRO-1323
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Priority: Minor


Avro request objects in the Python library are treated as a special kind of 
record schema without a name. But such objects should have a name -- if nothing 
else, they should have the same name as the message that they belong to.

Blocks AVRO-1318, in which fingerprints require every schema type -- including 
requests -- to report a fingerprint (usually their name).

It's an easy fix; I'll attach a patch in a few minutes.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (AVRO-1323) Python request schemas should report fullname

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn reassigned AVRO-1323:
-

Assignee: Jeremy Kahn

 Python request schemas should report fullname
 -

 Key: AVRO-1323
 URL: https://issues.apache.org/jira/browse/AVRO-1323
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor

 Avro request objects in the Python library are treated as a special kind of 
 record schema without a name. But such objects should have a name -- if 
 nothing else, they should have the same name as the message that they belong 
 to.
 Blocks AVRO-1318, in which fingerprints require every schema type -- 
 including requests -- to report a fingerprint (usually their name).
 It's an easy fix; I'll attach a patch in a few minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1323) Python request schemas should report fullname

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1323 started by Jeremy Kahn.

 Python request schemas should report fullname
 -

 Key: AVRO-1323
 URL: https://issues.apache.org/jira/browse/AVRO-1323
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor

 Avro request objects in the Python library are treated as a special kind of 
 record schema without a name. But such objects should have a name -- if 
 nothing else, they should have the same name as the message that they belong 
 to.
 Blocks AVRO-1318, in which fingerprints require every schema type -- 
 including requests -- to report a fingerprint (usually their name).
 It's an easy fix; I'll attach a patch in a few minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1323) Python request schemas should report fullname

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1323:
--

Attachment: AVRO-1323.patch

 Python request schemas should report fullname
 -

 Key: AVRO-1323
 URL: https://issues.apache.org/jira/browse/AVRO-1323
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Attachments: AVRO-1323.patch


 Avro request objects in the Python library are treated as a special kind of 
 record schema without a name. But such objects should have a name -- if 
 nothing else, they should have the same name as the message that they belong 
 to.
 Blocks AVRO-1318, in which fingerprints require every schema type -- 
 including requests -- to report a fingerprint (usually their name).
 It's an easy fix; I'll attach a patch in a few minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1323) Python request schemas should report fullname

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1323:
--

Status: Patch Available  (was: In Progress)

Patch work done [here|https://github.com/jkahn/avro/tree/AVRO-1323].

 Python request schemas should report fullname
 -

 Key: AVRO-1323
 URL: https://issues.apache.org/jira/browse/AVRO-1323
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Attachments: AVRO-1323.patch


 Avro request objects in the Python library are treated as a special kind of 
 record schema without a name. But such objects should have a name -- if 
 nothing else, they should have the same name as the message that they belong 
 to.
 Blocks AVRO-1318, in which fingerprints require every schema type -- 
 including requests -- to report a fingerprint (usually their name).
 It's an easy fix; I'll attach a patch in a few minutes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1318) Python schema should store fingerprints

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1318:
--

Attachment: AVRO-1318.patch

Patch from [here|https://github.com/jkahn/avro/tree/AVRO-1318].  A copy of 
changes from [~laserson] for fingerprinting files.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1318) Python schema should store fingerprints

2013-05-06 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1318:
--

Status: Patch Available  (was: In Progress)

Tests pass {{ant clean build}} for me.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1318) Python schema should store fingerprints

2013-05-06 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650219#comment-13650219
 ] 

Jeremy Kahn commented on AVRO-1318:
---

Oh, to be clear: when AVRO-1323 is included, then AVRO-1318 passes. The 
AVRO-1318 changes trigger the bug that AVRO-1323 addresses.

 Python schema should store fingerprints
 ---

 Key: AVRO-1318
 URL: https://issues.apache.org/jira/browse/AVRO-1318
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Attachments: AVRO-1318.patch


 Python schema objects need to produce a simple representation that 
 demonstrates their field identity.   {avro.schema.Schema} objects need to 
 provide a {fingerprint} member field to enable quick checking of schema 
 matching (even when the schema has other, possibly changed decoration).
 Based on a patch pulled from [~laserson]'s proposed changes to make a 
 collection of C-typing hints.  These changes will be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-02 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648060#comment-13648060
 ] 

Jeremy Kahn commented on AVRO-1316:
---

Scott's right about reducing the character count to 2^14: UTF 8 characters may 
be up to four bytes each (though that is a [gross 
overestimate|http://stackoverflow.com/questions/9533258/what-is-the-maximum-number-of-bytes-for-a-utf-8-encoded-character].
 I think it would be more likely to be readable in 2^14 character chunks, too.

 IDL code-generation generates too-long literals for very large schemas
 --

 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.5
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: AVRO-1316.patch


 When I work from a very large IDL schema, the Java code generated includes a 
 schema JSON literal that exceeds the length of the maximum allowed literal 
 string ([65535 
 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
   
 This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
 constant string too long}}.
 It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
 outrageous at all for some of the more involved data structures, especially 
 if we're including documentation strings etc.
 I believe the fix should be a bit more sensitivity to the length of the JSON 
 literal (and a willingness to split it into more than one literal, joined by 
 {{+}}), but I haven't figured out where that change needs to go. Has anyone 
 else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-01 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1316:
-

 Summary: IDL code-generation generates too-long literals for very 
large schemas
 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Jeremy Kahn
Priority: Minor


When I work from a very large IDL schema, the Java code generated includes a 
schema JSON literal that exceeds the length of the maximum allowed literal 
string ([65535 
characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
  

This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
constant string too long}}.


It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous 
at all for some of the more involved data structures, especially if we're 
including documentation strings etc.

I believe the fix should be a bit more sensitivity to the length of the JSON 
literal (and a willingness to split it into more than one literal, joined by 
{{+}}), but I haven't figured out where that change needs to go. Has anyone 
else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-24 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1296:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Philip Zeyliger merged the patch, and a followup patch that restored test 
functionality for Python 2.6.

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-22 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638196#comment-13638196
 ] 

Jeremy Kahn commented on AVRO-1296:
---

Philip, have you received any objections?  Could you commit this to trunk?

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1304) Python Avro match_schemas called redundantly

2013-04-22 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638198#comment-13638198
 ] 

Jeremy Kahn commented on AVRO-1304:
---

Uri, what strategy are you using to try to fix this? Could we memoize the 
partner schema to short-circuit out of match_schemas (trading a small amount of 
memory for speed)?

I'm eager to improve the speed of the Python library, and a 20% speedup could 
shave days off my team's product delivery.

Contact me offline (jer...@trochee.net) if you'd like to share your profiling 
setup (I can try to implement related speedups).  

 Python Avro match_schemas called redundantly
 

 Key: AVRO-1304
 URL: https://issues.apache.org/jira/browse/AVRO-1304
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Uri Laserson

 DatumReader.match_schemas(writers_schema, readers_schema) is called on every 
 single read from the DatumReader.  However, for almost every read, the 
 schemas used are the object members self.writers_schema and 
 self.readers_schema.  match_schemas should be checked only once in this case, 
 and only when the object members are modified.  This takes up 20% of my parse 
 time upon profiling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-22 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638407#comment-13638407
 ] 

Jeremy Kahn commented on AVRO-1296:
---

Looks like the Ubuntu 9.10 buildbot complains about this test patch. Updated 
test code is included here:

https://github.com/jkahn/avro/commit/9724cd0e17f338db6a12ebc1fce5132cdf934bc7 
{noformat}
@@ -379,7 +379,7 @@ def test_inner_namespace_not_rendered(self):
 self.assertEqual('com.acme.Greeting', proto.types[0].fullname)
 self.assertEqual('Greeting', proto.types[0].name)
 # but there shouldn't be 'namespace' rendered to json on the inner type
-self.assertNotIn('namespace', proto.to_json()['types'][0])
+self.assertFalse('namespace' in proto.to_json()['types'][0])
 
   def test_valid_cast_to_string_after_parse(self):
 
{noformat}

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1303) Python avro library does not support aliasing for schema evolution

2013-04-19 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1303:
-

 Summary: Python avro library does not support aliasing for schema 
evolution
 Key: AVRO-1303
 URL: https://issues.apache.org/jira/browse/AVRO-1303
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn


as discussed [on the mailing 
list|http://mail-archives.apache.org/mod_mbox/avro-user/201304.mbox/%3CCALEq1Z-ncmjLjvCCLeEgm%2BQvMmPAg5%2B0pVW%3De1N-%3DxtQcMApPw%40mail.gmail.com%3E],
 the Python {{avro}} libraries don't support aliases.  (the string {{alias}} is 
found nowhere in the Python source code.

We should update the Python library to accept aliases in matching schemas for:

 * field names
 * named types

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1304) Python Avro match_schemas called redundantly

2013-04-19 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637061#comment-13637061
 ] 

Jeremy Kahn commented on AVRO-1304:
---

This would be super useful to fix. Do you have a patch prepared?

 Python Avro match_schemas called redundantly
 

 Key: AVRO-1304
 URL: https://issues.apache.org/jira/browse/AVRO-1304
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Uri Laserson

 DatumReader.match_schemas(writers_schema, readers_schema) is called on every 
 single read from the DatumReader.  However, for almost every read, the 
 schemas used are the object members self.writers_schema and 
 self.readers_schema.  match_schemas should be checked only once in this case, 
 and only when the object members are modified.  This takes up 20% of my parse 
 time upon profiling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-16 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633308#comment-13633308
 ] 

Jeremy Kahn commented on AVRO-1296:
---

Commenting to nudge this issue.

Can somebody review these Python patches?  It's a small change but it fixes a 
fairly serious obstacle to using Avro files as a Java/Python interlingua for  
on-disk storage.

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-10 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1296:
-

 Summary: Python: schemas retrieved from protocol types ignore 
namespace
 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5


If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
{{namespace: ns}} and then retrieve a child schema {{s}} from the 
protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not have 
its namespace set (to {{ns}}), even if {{p}} has a namespace.

This is particularly problematic if I'm using {{s}} to write out an avro file 
intended to be read by a specific-type reader, because the file header will 
claim to be objects of type {{s}} (not {{ns.s}}, as expected).

I've attached two patches: one that makes sure that the {{namespace}} property 
of protocol types is set to the default namespace of the protocol when not 
otherwise set.

The second patch ensures that the {{namespace}} is *not* rendered into JSON 
when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-10 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1296 started by Jeremy Kahn.

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-10 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1296:
--

Attachment: AVRO-1296b.patch
AVRO-1296a.patch

AVRO-1296a and AVRO-1296b are the two patches mentioned in the OP.

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1296) Python: schemas retrieved from protocol types ignore namespace

2013-04-10 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1296:
--

Status: Patch Available  (was: In Progress)

Python tests pass {{ant clean build test}} after each of these patches are 
included. Each patch includes new tests that fail before and succeed after.

 Python: schemas retrieved from protocol types ignore namespace
 --

 Key: AVRO-1296
 URL: https://issues.apache.org/jira/browse/AVRO-1296
 Project: Avro
  Issue Type: Bug
  Components: python
Affects Versions: 1.7.4
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
 Fix For: 1.7.5

 Attachments: AVRO-1296a.patch, AVRO-1296b.patch


 If I parse a protocol {{p}} using {{avro.protocol.parse}}, which defines 
 {{namespace: ns}} and then retrieve a child schema {{s}} from the 
 protocol's {{proto.types}} (or {{proto.types_dict}}), then {{s}} does not 
 have its namespace set (to {{ns}}), even if {{p}} has a namespace.
 This is particularly problematic if I'm using {{s}} to write out an avro file 
 intended to be read by a specific-type reader, because the file header will 
 claim to be objects of type {{s}} (not {{ns.s}}, as expected).
 I've attached two patches: one that makes sure that the {{namespace}} 
 property of protocol types is set to the default namespace of the protocol 
 when not otherwise set.
 The second patch ensures that the {{namespace}} is *not* rendered into JSON 
 when a default protocol specifies the right value already.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1291) Python library missing strict JSON encode/decoe

2013-04-09 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1291:
-

 Summary: Python library missing strict JSON encode/decoe
 Key: AVRO-1291
 URL: https://issues.apache.org/jira/browse/AVRO-1291
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Jeremy Kahn


The Python Avro libraries don't actually have a proper JSON decoder or encoder, 
because they don't handle the [type-hinting for 
unions|http://avro.apache.org/docs/current/spec.html#json_encoding] properly.

The Python {{avro.io}} library should provide a pair of 
{{StrictJsonEncoder,StrictJsonDecoder}}} classes that correctly include (and 
decode) the type hints when the schema expects a union.

Jonathan Coveney [raised this 
concern|http://mail-archives.apache.org/mod_mbox/avro-user/201304.mbox/%3CCAKne9Z6nkYXwb4QzPr4qNyH1o7TnL1674MspgnHuKMuD2imguQ%40mail.gmail.com%3E]
 on the Avro User mailing list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1289) Python: Schema objects should polymorphically interact with data-walker interface

2013-04-04 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1289:
-

 Summary: Python: Schema objects should polymorphically interact 
with data-walker interface
 Key: AVRO-1289
 URL: https://issues.apache.org/jira/browse/AVRO-1289
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.5
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.8.0


Python {{avro.schema}} objects should be able to call back to a general 
data-and-schema parallel-walker (validate would be one of those, but so could 
be default-filler). 

There should be an {{avro.walker}} interface that owns the parallel state (a 
datum-reader/deserializer, a datum-writer/serializer, a validator, or a 
default-filler -- see AVRO-1265). Schema polymorphism would allow us to 
eliminate the large (and highly redundant) function-dispatch methods in 
{{avro.io}} by making the {{avro.schema.Schema}} subclass responsible for 
calling back to the {{avro.walker}} object.

Assigning this to v1.8.0 because it may be difficult to duplicate *every* 
behavior of 1.7.* with the same function signatures, especially where this 
refactor may be eliminate entire classes.

This factoring ought to make it easier to improve or extend objects that meet 
this {{walker}} interface -- validators  serializers might be able to store 
more state about their position within a record, for example, to yield more 
informative error messages upon mismatch (as requested by Jonathan Coveney on 
the user mailing list).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1286) Python script avro cat should be able to read from stdin

2013-04-02 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620615#comment-13620615
 ] 

Jeremy Kahn commented on AVRO-1286:
---

Biggest headache here is that the python avro data file library requires that 
the file be seekable. Standard in is not seekable.

I think this is a bug or a misfeature in the python library and probably 
deserves a ticket of its own.

 Python script avro cat should be able to read from stdin
 

 Key: AVRO-1286
 URL: https://issues.apache.org/jira/browse/AVRO-1286
 Project: Avro
  Issue Type: Bug
  Components: python
Reporter: Uri Laserson
Priority: Minor

 Currently, you have to specify a target file on the command line.  But it 
 would be nice to be able to stream data through avro cat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-30 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Labels: patch  (was: )

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: patch
 Fix For: 1.7.5

 Attachments: validation-as-method-backwards-compatible.patch, 
 validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work stopped] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-30 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1284 stopped by Jeremy Kahn.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5

 Attachments: validation-as-method-backwards-compatible.patch, 
 validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-30 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Status: Patch Available  (was: Open)

Seems to be a working fix. tests pass.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: patch
 Fix For: 1.7.5

 Attachments: validation-as-method-backwards-compatible.patch, 
 validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-03-30 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1265:
--

Attachment: avro-1265b-tests.patch
avro-1265a-build-defaults.patch

Implement default-build behavior on schema and update tests to do (rather 
cursory) testing of this behavior

 Python: schema objects should support builder() default-filling behavior
 

 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Fix For: 1.7.5

 Attachments: avro-1265a-build-defaults.patch, avro-1265b-tests.patch


 There seems to be no way to easily use the avro libraries in Python (where I 
 feel most qualified to comment) to encode generics with missing default 
 values and have them transmitted in well-formed avro binary.
 If you fill in the missing default values, the Python libraries will 
 transmit correctly.
 I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
 libraries) that fill defaults on missing member fields of a record, 
 recursively (which probably means method extension of other schema classes as 
 well).
 For backwards compatibility (and probably to avoid unnecessary data 
 traversal), clients probably want to explicitly ask the schema to fill in 
 defaults before transmission in the cases where you'd like to set only the 
 non-default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1284:
-

 Summary: Python: validation should be a method of Schema objects
 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5


In Python, validation of a datum by the schema was done in {{avro.io.validate}} 
function.
The {{avro.io.validate}} function is a complex, recursively-called switch 
statement.

Instead of calling a two-argument {{avro.io.validate}} with a Schema object and 
a datum, it is easier to understand and extend if they are one-argument methods 
on the schema.

I (Jeremy) have written a patch that implements {{validate}} methods on Schema 
objects. This patch will form the prerequisite for AVRO-1265 (see easier to 
extend above).



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1284 started by Jeremy Kahn.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Attachment: validation-as-method.patch

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5

 Attachments: validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1284) Python: validation should be a method of Schema objects

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1284:
--

Attachment: validation-as-method-backwards-compatible.patch

The {{validation-as-method-backwards-compatible}} patch maintains the 
functional behavior of {{avro.io.validate}} by calling the method indirectly, 
in case users are calling {{avro.io.validate}}.

Prefer this patch to the simpler {{validation-as-method}} patch.

 Python: validation should be a method of Schema objects
 ---

 Key: AVRO-1284
 URL: https://issues.apache.org/jira/browse/AVRO-1284
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5

 Attachments: validation-as-method-backwards-compatible.patch, 
 validation-as-method.patch


 In Python, validation of a datum by the schema was done in 
 {{avro.io.validate}} function.
 The {{avro.io.validate}} function is a complex, recursively-called switch 
 statement.
 Instead of calling a two-argument {{avro.io.validate}} with a Schema object 
 and a datum, it is easier to understand and extend if they are one-argument 
 methods on the schema.
 I (Jeremy) have written a patch that implements {{validate}} methods on 
 Schema objects. This patch will form the prerequisite for AVRO-1265 (see 
 easier to extend above).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-03-25 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-1265 started by Jeremy Kahn.

 Python: schema objects should support builder() default-filling behavior
 

 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Fix For: 1.7.5


 There seems to be no way to easily use the avro libraries in Python (where I 
 feel most qualified to comment) to encode generics with missing default 
 values and have them transmitted in well-formed avro binary.
 If you fill in the missing default values, the Python libraries will 
 transmit correctly.
 I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
 libraries) that fill defaults on missing member fields of a record, 
 recursively (which probably means method extension of other schema classes as 
 well).
 For backwards compatibility (and probably to avoid unnecessary data 
 traversal), clients probably want to explicitly ask the schema to fill in 
 defaults before transmission in the cases where you'd like to set only the 
 non-default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-02-28 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1265:
-

 Summary: Python: schema objects should support builder() 
default-filling behavior
 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
 Fix For: 1.7.5


There seems to be no way to easily use the avro libraries in Python (where I 
feel most qualified to comment) to encode generics with missing default 
values and have them transmitted in well-formed avro binary.

If you fill in the missing default values, the Python libraries will transmit 
correctly.

I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
libraries) that fill defaults on missing member fields of a record, 
recursively (which probably means method extension of other schema classes as 
well).

For backwards compatibility (and probably to avoid unnecessary data traversal), 
clients probably want to explicitly ask the schema to fill in defaults before 
transmission in the cases where you'd like to set only the non-default values.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-02-28 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589764#comment-13589764
 ] 

Jeremy Kahn commented on AVRO-1265:
---

see [this 
thread|http://mail-archives.apache.org/mod_mbox/avro-user/201302.mbox/%3cca+i_aek0-rofp5fmwte7at0jyzhrvsq9nmjubvovrkbex6m...@mail.gmail.com%3E]
 on the mailing list.

 Python: schema objects should support builder() default-filling behavior
 

 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Fix For: 1.7.5


 There seems to be no way to easily use the avro libraries in Python (where I 
 feel most qualified to comment) to encode generics with missing default 
 values and have them transmitted in well-formed avro binary.
 If you fill in the missing default values, the Python libraries will 
 transmit correctly.
 I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
 libraries) that fill defaults on missing member fields of a record, 
 recursively (which probably means method extension of other schema classes as 
 well).
 For backwards compatibility (and probably to avoid unnecessary data 
 traversal), clients probably want to explicitly ask the schema to fill in 
 defaults before transmission in the cases where you'd like to set only the 
 non-default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1265) Python: schema objects should support builder() default-filling behavior

2013-02-28 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589970#comment-13589970
 ] 

Jeremy Kahn commented on AVRO-1265:
---

Here's [where development is 
happening|https://github.com/jkahn/avro/tree/feature/fill-defaults] for this 
ticket.  I need to add tests, and won't propose a patch until I have them. I'm 
hoping to find time to write the tests tomorrow or early next week.

 Python: schema objects should support builder() default-filling behavior
 

 Key: AVRO-1265
 URL: https://issues.apache.org/jira/browse/AVRO-1265
 Project: Avro
  Issue Type: Improvement
  Components: python
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: features
 Fix For: 1.7.5


 There seems to be no way to easily use the avro libraries in Python (where I 
 feel most qualified to comment) to encode generics with missing default 
 values and have them transmitted in well-formed avro binary.
 If you fill in the missing default values, the Python libraries will 
 transmit correctly.
 I'd be happy to add methods to the avro.RecordSchema objects (in the Python 
 libraries) that fill defaults on missing member fields of a record, 
 recursively (which probably means method extension of other schema classes as 
 well).
 For backwards compatibility (and probably to avoid unnecessary data 
 traversal), clients probably want to explicitly ask the schema to fill in 
 defaults before transmission in the cases where you'd like to set only the 
 non-default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)
Jeremy Kahn created AVRO-1255:
-

 Summary: Python schema (message, protocol) to_json names argument 
should be optional
 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor


The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various classes 
in {{avro.schema}} all support a {{to_json}} method which renders the data in 
Python generics (easily renderable to json).

These methods all take a required {{names}} argument (of type 
{{avro.schema.Names}} which stores state representing what types have already 
been rendered.

For debugging -- and for other uses of the schema -- it is helpful if the 
{{names}} argument is optional.  When it is not provided, each method should 
construct an empty {{schema.Names}} object internally. {{to_json}} thus can be 
invoked without argument to get the relevant rendering of the current schema in 
isolation.

Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1255:
--

Attachment: avro-1255.patch

 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}} which stores state representing what types have already 
 been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578606#comment-13578606
 ] 

Jeremy Kahn commented on AVRO-1255:
---

{{cd lang/py  ant build test}} passes all the tests with this patch applied, 
AFAICT.

 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}} which stores state representing what types have already 
 been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1255:
--

Description: 
The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various classes 
in {{avro.schema}} all support a {{to_json}} method which renders the data in 
Python generics (easily renderable to json).

These methods all take a required {{names}} argument (of type 
{{avro.schema.Names}}) which stores state representing what types have already 
been rendered.

For debugging -- and for other uses of the schema -- it is helpful if the 
{{names}} argument is optional.  When it is not provided, each method should 
construct an empty {{schema.Names}} object internally. {{to_json}} thus can be 
invoked without argument to get the relevant rendering of the current schema in 
isolation.

Patch to be attached.

  was:
The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various classes 
in {{avro.schema}} all support a {{to_json}} method which renders the data in 
Python generics (easily renderable to json).

These methods all take a required {{names}} argument (of type 
{{avro.schema.Names}} which stores state representing what types have already 
been rendered.

For debugging -- and for other uses of the schema -- it is helpful if the 
{{names}} argument is optional.  When it is not provided, each method should 
construct an empty {{schema.Names}} object internally. {{to_json}} thus can be 
invoked without argument to get the relevant rendering of the current schema in 
isolation.

Patch to be attached.


 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}}) which stores state representing what types have 
 already been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578702#comment-13578702
 ] 

Jeremy Kahn commented on AVRO-1255:
---

Will add in some tests to generate the generic schema without the names 
parameter (exercising this new function, and send a second patch unifying the 
changes.

 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}}) which stores state representing what types have 
 already been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Kahn updated AVRO-1255:
--

Attachment: avro-1255-b.patch

 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255-b.patch, avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}}) which stores state representing what types have 
 already been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1255) Python schema (message, protocol) to_json names argument should be optional

2013-02-14 Thread Jeremy Kahn (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13578729#comment-13578729
 ] 

Jeremy Kahn commented on AVRO-1255:
---

new patch added. Rather than adding in new tests, I discovered that several 
stringification functions (used throughout the tests) could be simplified with 
this access. 

The new patch (1255-b) simplifies those stringification methods in just that 
way, so the new behavior is well-exercised by the tests.

 Python schema (message, protocol) to_json names argument should be optional
 ---

 Key: AVRO-1255
 URL: https://issues.apache.org/jira/browse/AVRO-1255
 Project: Avro
  Issue Type: Improvement
  Components: python
Affects Versions: 1.7.3
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: avro-1255-b.patch, avro-1255.patch


 The {{avro.protocol.Protocol}}, {{avro.protocol.Message}}, and various 
 classes in {{avro.schema}} all support a {{to_json}} method which renders the 
 data in Python generics (easily renderable to json).
 These methods all take a required {{names}} argument (of type 
 {{avro.schema.Names}}) which stores state representing what types have 
 already been rendered.
 For debugging -- and for other uses of the schema -- it is helpful if the 
 {{names}} argument is optional.  When it is not provided, each method should 
 construct an empty {{schema.Names}} object internally. {{to_json}} thus can 
 be invoked without argument to get the relevant rendering of the current 
 schema in isolation.
 Patch to be attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira