[jira] [Created] (AVRO-2355) Add compressionLevel to ZStandard compression

2019-03-20 Thread Scott Carey (JIRA)
Scott Carey created AVRO-2355:
-

 Summary: Add compressionLevel to ZStandard compression
 Key: AVRO-2355
 URL: https://issues.apache.org/jira/browse/AVRO-2355
 Project: Apache Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
 Fix For: 1.9.0


ZStandard compression should not be released without support for compression 
level selection.

Its biggest advantage is the massive range by which you can select the 
compression level, all while keeping decompression throughput very high.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2273) Release 1.8.3

2019-03-20 Thread Scott Carey (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797606#comment-16797606
 ] 

Scott Carey commented on AVRO-2273:
---

Its impossible to use Flink (and Kafka Connect in some cases) w/ 1.8.2 and a 
SpecificRecord that has an Enum in it.

 

I'm running a custom version as a result (with a few other cherry-picked things 
from the master branch and built for java 8, so it can't be an official 
release).

> Release 1.8.3
> -
>
> Key: AVRO-2273
> URL: https://issues.apache.org/jira/browse/AVRO-2273
> Project: Apache Avro
>  Issue Type: Task
>  Components: release
>Reporter: Thiruvalluvan M. G.
>Priority: Major
> Fix For: 1.8.3
>
>
> This ticket is for releasing Avro 1.8.3 and discussing any topics related to 
> it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2162) Add Zstandard compression to avro file format

2018-03-22 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-2162:
--
Description: 
I'd like to add Zstandard compression for Avro. 

At compression level 1 It is almost as fast as Snappy at compression, with 
compression ratios more like gzip.  At higher levels of compression, it is more 
compact than gzip -9 with much lower CPU when compressing and roughly 3x faster 
decompression.

 

Adding it to Java is fairly easy.  We'll need to say something about it in the 
spec however, as an 'optinal' codec.

 

  was:
I'd like to add Zstandard compression for Avro. 

It is almost as fast as Snappy at compression, with compression ratios more 
like gzip.  At higher levels of compression, it is more compact than gzip -9 
with much lower CPU when compressing and roughly 3x faster decompression.

 

Adding it to Java is fairly easy.  We'll need to say something about it in the 
spec however, as an 'optinal' codec.

 


> Add Zstandard compression to avro file format
> -
>
> Key: AVRO-2162
> URL: https://issues.apache.org/jira/browse/AVRO-2162
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Scott Carey
>Priority: Major
>
> I'd like to add Zstandard compression for Avro. 
> At compression level 1 It is almost as fast as Snappy at compression, with 
> compression ratios more like gzip.  At higher levels of compression, it is 
> more compact than gzip -9 with much lower CPU when compressing and roughly 3x 
> faster decompression.
>  
> Adding it to Java is fairly easy.  We'll need to say something about it in 
> the spec however, as an 'optinal' codec.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2162) Add Zstandard compression to avro file format

2018-03-22 Thread Scott Carey (JIRA)
Scott Carey created AVRO-2162:
-

 Summary: Add Zstandard compression to avro file format
 Key: AVRO-2162
 URL: https://issues.apache.org/jira/browse/AVRO-2162
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey


I'd like to add Zstandard compression for Avro. 

It is almost as fast as Snappy at compression, with compression ratios more 
like gzip.  At higher levels of compression, it is more compact than gzip -9 
with much lower CPU when compressing and roughly 3x faster decompression.

 

Adding it to Java is fairly easy.  We'll need to say something about it in the 
spec however, as an 'optinal' codec.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2013-11-19 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827415#comment-13827415
 ] 

Scott Carey commented on AVRO-1124:
---

All:
I apologize for the long delay.  

What we have used in production for about a year is very close to what has been 
in this ticket the whole time.  I have never considered it complete for a few 
reasons.

I have been close to done with this for some time now but swamped by other 
responsibilities and what is currently in use has been good enough for now, but 
it won't be for long.   
The latest changes however, would significantly impact some of the API with 
respect to how the schema repo manages validation and compatibility.  This 
would be significantly more flexible for interfacing with other systems.

It boils down to the following observation:

It appears that all notions of schema compatibility share a common form.  The 
previously discussed forwards compatible or N + 1 compatibility are all 
flavors of the same set of constraints. 

In any set of schemas you wish to consider for compatibility (a Subject 
here), at any given time you have a subset of these schemas that you wish to be 
able to read with, a subset you must be able to read from.  You may have some 
that you neither wish to read from or write to but must keep the mapping of the 
id.

The way to represent this is to have a read state  and a write state per 
schema in the subject.
The read state has two possible values, naming help needed:   reader, 
not_readable 
The write state has three possible values, naming help needed:,  writer, 
written, not_writable

The constraint of the system is that all reader schemas can read all writer 
and written schemas, per subject.  A schema can transition either state, one 
at a time, leading to pair-wise testing of schema X can read Y:
 *  A schema transition from not_readable to reader succeeds only if it can 
read all schemas that are currently writer and written.
 *  A schema transition from reader to not_readable requires no pairwise 
schema validation, but some other pluggable validation may apply.
 *  A schema transition from not_writable to writer or written requires 
pairwise validation that that schema can be ready by all current reader 
schemas.
 *  A all other schema write state transitions do not require pair-wise schema 
validation, but other pluggable validation may apply.

The write state has three possibilities because it is important to 
differentiate between the case where you allow new records of this type to be 
written (writer) from one where you wish no new records to be written, but 
the data store still has values with the schema present (written). 

Every compatibility scheme can fit in the above.  Single-reader, multiple 
writer.   Single writer, multiple reader.N+-1 compatibility.  Full 
cross-compatibility.   The above is significantly more flexible than the early 
proposals on this topic, but will require changes to the REST interface.   
Loading data from the old, into the new, will be fairly simple however -- some 
curl commands and bash scripts will do it.
 

, and enforce the constraint that all schemas that have their can read with 
state set must be able to read all schemas that have their write state in can 
write with or must be able to read


 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this 

[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+

2013-10-08 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789298#comment-13789298
 ] 

Scott Carey commented on AVRO-1126:
---

I would like to consider that in our API we don't use any JSON default values 
at all, but instead our own types for values.  Defaults are currently a 
performance issue in the decoder because we read the JSON value for every 
record.  

The SchemaBuilder API does not expose the use of Jackson types in its API.

 Upgrade to Jackson 2+
 -

 Key: AVRO-1126
 URL: https://issues.apache.org/jira/browse/AVRO-1126
 Project: Avro
  Issue Type: Task
  Components: java
Reporter: James Tyrrell
Priority: Critical
 Fix For: 1.8.0


 Quite annoyingly with Jackson 2+ the base package name has changed from 
 org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the 
 dependencies from:
 {code:xml} 
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-core-asl/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-mapper-asl/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 to:
 {code:xml} 
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 the base package in the code needs to be updated. More info can be found 
 [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do 
 the work just let me know what is preferable i.e. should I just attach a 
 patch to this issue?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+

2013-10-08 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789865#comment-13789865
 ] 

Scott Carey commented on AVRO-1126:
---

I propose that we have a canonical representation of schema default values that 
has no external surface area of a third party library, as part of the Schema 
API.   Users should not be required to explicitly link to jackson to use our 
API, so that we can change implementation details such as which JSON library we 
use without breaking the API.   

We could choose the Generic representations for this, or make new ones.  It is 
all easy until you get to arrays, maps, and records.

Specific, reflect, generic, or future representations can all be different.  A 
new and improved schema resolution system would convert the default value to 
the target type once at schema resolution time instead of every record read.  
This sort of re-use would require immutable data representation or copying.

 Upgrade to Jackson 2+
 -

 Key: AVRO-1126
 URL: https://issues.apache.org/jira/browse/AVRO-1126
 Project: Avro
  Issue Type: Task
  Components: java
Reporter: James Tyrrell
Priority: Critical
 Fix For: 1.8.0


 Quite annoyingly with Jackson 2+ the base package name has changed from 
 org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the 
 dependencies from:
 {code:xml} 
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-core-asl/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-mapper-asl/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 to:
 {code:xml} 
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 the base package in the code needs to be updated. More info can be found 
 [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do 
 the work just let me know what is preferable i.e. should I just attach a 
 patch to this issue?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (AVRO-739) Add Date/Time data types

2013-09-06 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760779#comment-13760779
 ] 

Scott Carey commented on AVRO-739:
--

{quote}
These seem like two different external representations of the same thing. A 
time plus a timezone can be losslessly converted to a UTC time. You do lose the 
original timezone, but dates and times are usually displayed in the timezone of 
the displayer, not where the time was originally noted.
{quote}

I completely agree for use cases where the time is being displayed to a user, 
but there are use cases where the loss of the original time zone is not 
acceptable.   One could log another field with the timezone identifier for 
these.The use case for a UTC timestamp is more broadly applicable.  I do 
not think we need to implement the one that also persists timezone now, but I 
do think we need to make sure that if we did implement such a thing in the 
future, that the names for these two things would be consistent.  If we name 
this Datetime we are implying it has relation to dates, which implies 
relationship to timezones. 

With respect to the SQL variants, I see only two that represent a single point 
in time. Three are either dates or times and not the combination (e.g. January 
7, 2100, representing a time with granularity of one day, or 5:01 -- a time 
of day, respectively).

The two SQL equivalents are TIMESTAMP and TIMESTAMP WITH TIMEZONE.   This 
proposal covers TIMESTAMP, roughly.  I am suggesting we reserve space for a 
future TIMESTAMP WITH TIMEZONE.   We could adopt the names for consistency.

timestamp
and
timestamptz

There is also the question of serialization in JSON form.  A long in binary 
form makes sense, but in JSON, an ISO8601 string might be more useful.

 Add Date/Time data types
 

 Key: AVRO-739
 URL: https://issues.apache.org/jira/browse/AVRO-739
 Project: Avro
  Issue Type: New Feature
  Components: spec
Reporter: Jeff Hammerbacher
 Attachments: AVRO-739.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2013-08-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742596#comment-13742596
 ] 

Scott Carey commented on AVRO-1124:
---

Yes.  I have quite a bit of work outstanding on this to finish and submit for 
review.  But I'll be on vacation for 2 weeks.

Re: Incremental ids:

The ids don't have to be incremental, that is an option that is up to the 
repository implementation.  They can also be an arbitrary string.

Your repositories probably will not map directly to environments, but to the 
data.  If you are sharing data across environments, you will share repositories 
(or clone them) with the data.

 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a schema with log data end-to-end from the producing app to 
 various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
 This schema metadata can then be used to auto-create hive tables (or add new 
 fields to existing tables), or inferring pig fields, all without manual 
 intervention. One important definition of compatibility that is nice to 
 enforce is compatibility with historical data for a given table. Log data 
 is usually loaded in an append-only manner, so if someone changes an int 
 field in a particular data set to be a string, tools like pig or hive that 
 expect static columns will be unusable. Even using plain-vanilla map/reduce 
 processing data where columns and types change willy nilly is painful. 
 However the person emitting this kind of data may not know all the details of 
 compatible schema evolution. We use the schema repository to validate that 
 any change made to a schema don't violate the compatibility model, and reject 
 the update if it does. We do this check both at run time, and also as part of 
 the ant task that generates specific record code (as an early warning). 
 Some 

[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2013-08-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742628#comment-13742628
 ] 

Scott Carey commented on AVRO-1124:
---

Its pluggable, so there are many options.

We also have staging/prod/qa/dev environments.  There is a repo for each, but 
when qa gets its data snapshot from prod, we also clone the repo.  

For dev/staging, we have a kafka mirror that is exactly the production data.  
Both of these environments access the prod repo read-only.  In fact, even in 
production, most subjects are read-only to all applications.  Operations has to 
add a new schemas for a release.  This is akin to operations executing SQL 
scripts to do DDL prior to a code push.  Having applicaitons 'automagically' 
update sql schemas or push avro schemas can lead to accidents, unless the 
security model is implemented properly.



 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a schema with log data end-to-end from the producing app to 
 various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
 This schema metadata can then be used to auto-create hive tables (or add new 
 fields to existing tables), or inferring pig fields, all without manual 
 intervention. One important definition of compatibility that is nice to 
 enforce is compatibility with historical data for a given table. Log data 
 is usually loaded in an append-only manner, so if someone changes an int 
 field in a particular data set to be a string, tools like pig or hive that 
 expect static columns will be unusable. Even using plain-vanilla map/reduce 
 processing data where columns and types change willy nilly is painful. 
 However the person emitting this kind of data may not know all the details of 
 compatible schema evolution. We use the schema repository to validate that 
 any change made to 

[jira] [Comment Edited] (AVRO-1124) RESTful service for holding schemas

2013-08-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742628#comment-13742628
 ] 

Scott Carey edited comment on AVRO-1124 at 8/16/13 9:43 PM:


Schema id generation pluggable, so there are many options.  The _only_ 
requirement is that within a subject ids are unique and correspond to unique 
schemas.

We also have staging/prod/qa/dev environments.  There is a repo for each, but 
when qa gets its data snapshot from prod, we also clone the repo.  

For dev/staging, we have a kafka mirror that is exactly the production data.  
Both of these environments access the prod repo read-only.  In fact, even in 
production, most subjects are read-only to all applications.  Operations has to 
add a new schemas for a release.  This is akin to operations executing SQL 
scripts to do DDL prior to a code push.  Having applicaitons 'automagically' 
update sql schemas or push avro schemas can lead to accidents, unless the 
security model is implemented properly.



  was (Author: scott_carey):
Its pluggable, so there are many options.

We also have staging/prod/qa/dev environments.  There is a repo for each, but 
when qa gets its data snapshot from prod, we also clone the repo.  

For dev/staging, we have a kafka mirror that is exactly the production data.  
Both of these environments access the prod repo read-only.  In fact, even in 
production, most subjects are read-only to all applications.  Operations has to 
add a new schemas for a release.  This is akin to operations executing SQL 
scripts to do DDL prior to a code push.  Having applicaitons 'automagically' 
update sql schemas or push avro schemas can lead to accidents, unless the 
security model is implemented properly.


  
 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is changed to a 
 string that will almost certainly break anyone relying on that field. This 
 definition of compatibility can differ for different use cases and should 
 likely be pluggable.
 Here is a description of one of our uses of this facility at LinkedIn. We use 
 this to retain a 

[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+

2013-08-13 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737925#comment-13737925
 ] 

Scott Carey commented on AVRO-1126:
---


We should definitely clean up the exposure of Jackson in our API.  I propose 
deprecating the use of it in the 1.8.x in favor of a replacement and remove it 
in 1.9.x.

Upgrading to 2.x from 1.x is a different issue.  I fail to see how upgrading to 
2.2 is urgent at all.  What new features do you propose that Avro needs to use 
internally?  If Avro no longer exposes Jackson in its API, it is a purely 
internal matter to Avro and does not affect users who might want to use Jackson 
2.x themselves, since Jackson 1.x and 2.x live in non-conflicting namespaces 
both in maven and java package names. 

 Upgrade to Jackson 2+
 -

 Key: AVRO-1126
 URL: https://issues.apache.org/jira/browse/AVRO-1126
 Project: Avro
  Issue Type: Task
  Components: java
Reporter: James Tyrrell
Priority: Critical
 Fix For: 1.8.0


 Quite annoyingly with Jackson 2+ the base package name has changed from 
 org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the 
 dependencies from:
 {code:xml} 
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-core-asl/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdorg.codehaus.jackson/groupId
 artifactIdjackson-mapper-asl/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 to:
 {code:xml} 
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version${jackson.version}/version
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version${jackson.version}/version
 /dependency
 {code} 
 the base package in the code needs to be updated. More info can be found 
 [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do 
 the work just let me know what is preferable i.e. should I just attach a 
 patch to this issue?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion

2013-08-13 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739076#comment-13739076
 ] 

Scott Carey commented on AVRO-1348:
---

About a year ago I experimented with all sorts of UTF8 to string optimizations, 
using state machines and other techniques in addition to those similar to this 
patch and only ever got minor (5%) improvements.  It was hard to beat 'new 
String(bytes, 0, length, UTF8)' safely.  A fully custom state machine utf8 
decoder was almost 10% faster.  

 Improve Utf8 to String conversion
 -

 Key: AVRO-1348
 URL: https://issues.apache.org/jira/browse/AVRO-1348
 Project: Avro
  Issue Type: Bug
Reporter: Mark Wagner
Assignee: Mohammad Kamrul Islam
 Attachments: AVRO1348v1.patch


 AVRO-1241 found that the existing method of creating Strings from Utf8 byte 
 arrays could be made faster. The same method is being used in the 
 Utf8.toString(), and could likely be sped up by doing the same thing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.

2013-07-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1144:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed in revision 1508713.

 Deadlock with FSInput and Hadoop NativeS3FileSystem.
 

 Key: AVRO-1144
 URL: https://issues.apache.org/jira/browse/AVRO-1144
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.0
 Environment: Hadoop 1.0.3
Reporter: Shawn Smith
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1144.patch


 Deadlock can occur when using org.apache.avro.mapred.FsInput to read files 
 from S3 using the Hadoop NativeS3FileSystem and multiple threads.
 There are a lot of components involved, but the basic cause is pretty simple: 
 Apache Commons HttpClient can deadlock waiting for a free HTTP connection 
 when the number of threads downloading from S3 is greater than or equal to 
 the maximum allowed HTTP connections per host.
 I've filed this bug against Avro because the bug is easiest to fix in Avro.  
 Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls 
 in the FSInput constructor:
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.stream = path.getFileSystem(conf).open(path);
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
 }
 {noformat}
 to
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
   this.stream = path.getFileSystem(conf).open(path);
 }
 {noformat}
 Here's what triggers the deadlock:
 * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and 
 open an HTTP connection for downloading content.  This acquires an HTTP 
 connection but does not release it.
 * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to 
 S3 and perform a HEAD request to get object metadata.  This attempts to 
 acquire a second HTTP connection.
 * Jets3t uses Apache Commons HTTP Client which limits the number of 
 simultaneous HTTP connections to a given host.  Lets say this maximum is 4 
 (the default)...  If 4 threads all call the FSInput constructor concurrently, 
 the 4 FileSystem.open() calls can acquire all 4 available connections and the 
 FileSystem.getFileStatus() calls block forever waiting for a thread to 
 release an HTTP connection back to the connection pool.
 A simple way to reproduce the problem this problem is to create 
 jets3t.properties in your classpath with httpclient.max-connections=1.  
 Then try to open a file using FSInput and the Native S3 file system (new 
 Path(s3n://bucket/path)).  It will hang indefinitely inside the FSInput 
 constructor.
 Swapping the order of the open() and getFileStatus() calls ensures that a 
 given thread using FSInput has at most one outstanding connection S3 at a 
 time.  As a result, one thread should always be able to make progress, 
 avoiding deadlock.
 Here's a sample stack trace of a deadlocked thread:
 {noformat}
 pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() 
 [116a02000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
   - locked 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492)
   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793)
   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225)
   at 
 

[jira] [Commented] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.

2013-07-29 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722677#comment-13722677
 ] 

Scott Carey commented on AVRO-1144:
---

Looks reasonable to me.  With the change, all tests pass. I will commit this 
tomorrow if there are no objections, and provide the trivial patch now.

 Deadlock with FSInput and Hadoop NativeS3FileSystem.
 

 Key: AVRO-1144
 URL: https://issues.apache.org/jira/browse/AVRO-1144
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.0
 Environment: Hadoop 1.0.3
Reporter: Shawn Smith
 Attachments: AVRO-1144.patch


 Deadlock can occur when using org.apache.avro.mapred.FsInput to read files 
 from S3 using the Hadoop NativeS3FileSystem and multiple threads.
 There are a lot of components involved, but the basic cause is pretty simple: 
 Apache Commons HttpClient can deadlock waiting for a free HTTP connection 
 when the number of threads downloading from S3 is greater than or equal to 
 the maximum allowed HTTP connections per host.
 I've filed this bug against Avro because the bug is easiest to fix in Avro.  
 Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls 
 in the FSInput constructor:
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.stream = path.getFileSystem(conf).open(path);
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
 }
 {noformat}
 to
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
   this.stream = path.getFileSystem(conf).open(path);
 }
 {noformat}
 Here's what triggers the deadlock:
 * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and 
 open an HTTP connection for downloading content.  This acquires an HTTP 
 connection but does not release it.
 * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to 
 S3 and perform a HEAD request to get object metadata.  This attempts to 
 acquire a second HTTP connection.
 * Jets3t uses Apache Commons HTTP Client which limits the number of 
 simultaneous HTTP connections to a given host.  Lets say this maximum is 4 
 (the default)...  If 4 threads all call the FSInput constructor concurrently, 
 the 4 FileSystem.open() calls can acquire all 4 available connections and the 
 FileSystem.getFileStatus() calls block forever waiting for a thread to 
 release an HTTP connection back to the connection pool.
 A simple way to reproduce the problem this problem is to create 
 jets3t.properties in your classpath with httpclient.max-connections=1.  
 Then try to open a file using FSInput and the Native S3 file system (new 
 Path(s3n://bucket/path)).  It will hang indefinitely inside the FSInput 
 constructor.
 Swapping the order of the open() and getFileStatus() calls ensures that a 
 given thread using FSInput has at most one outstanding connection S3 at a 
 time.  As a result, one thread should always be able to make progress, 
 avoiding deadlock.
 Here's a sample stack trace of a deadlocked thread:
 {noformat}
 pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() 
 [116a02000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
   - locked 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492)
   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793)
   at 

[jira] [Assigned] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.

2013-07-29 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey reassigned AVRO-1144:
-

Assignee: Scott Carey

 Deadlock with FSInput and Hadoop NativeS3FileSystem.
 

 Key: AVRO-1144
 URL: https://issues.apache.org/jira/browse/AVRO-1144
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.0
 Environment: Hadoop 1.0.3
Reporter: Shawn Smith
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1144.patch


 Deadlock can occur when using org.apache.avro.mapred.FsInput to read files 
 from S3 using the Hadoop NativeS3FileSystem and multiple threads.
 There are a lot of components involved, but the basic cause is pretty simple: 
 Apache Commons HttpClient can deadlock waiting for a free HTTP connection 
 when the number of threads downloading from S3 is greater than or equal to 
 the maximum allowed HTTP connections per host.
 I've filed this bug against Avro because the bug is easiest to fix in Avro.  
 Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls 
 in the FSInput constructor:
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.stream = path.getFileSystem(conf).open(path);
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
 }
 {noformat}
 to
 {noformat}
 /** Construct given a path and a configuration. */
 public FsInput(Path path, Configuration conf) throws IOException {
   this.len = path.getFileSystem(conf).getFileStatus(path).getLen();
   this.stream = path.getFileSystem(conf).open(path);
 }
 {noformat}
 Here's what triggers the deadlock:
 * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and 
 open an HTTP connection for downloading content.  This acquires an HTTP 
 connection but does not release it.
 * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to 
 S3 and perform a HEAD request to get object metadata.  This attempts to 
 acquire a second HTTP connection.
 * Jets3t uses Apache Commons HTTP Client which limits the number of 
 simultaneous HTTP connections to a given host.  Lets say this maximum is 4 
 (the default)...  If 4 threads all call the FSInput constructor concurrently, 
 the 4 FileSystem.open() calls can acquire all 4 available connections and the 
 FileSystem.getFileStatus() calls block forever waiting for a thread to 
 release an HTTP connection back to the connection pool.
 A simple way to reproduce the problem this problem is to create 
 jets3t.properties in your classpath with httpclient.max-connections=1.  
 Then try to open a file using FSInput and the Native S3 file system (new 
 Path(s3n://bucket/path)).  It will hang indefinitely inside the FSInput 
 constructor.
 Swapping the order of the open() and getFileStatus() calls ensures that a 
 given thread using FSInput has at most one outstanding connection S3 at a 
 time.  As a result, one thread should always be able to make progress, 
 avoiding deadlock.
 Here's a sample stack trace of a deadlocked thread:
 {noformat}
 pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() 
 [116a02000]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   - waiting on 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
   - locked 785892cc0 (a 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
   at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
   at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
   at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556)
   at 
 org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492)
   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793)
   at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225)
   at 
 

[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API

2013-07-28 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1325:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Additionally required upgrading Jackson to 1.9.13 and some pom.xml changes to 
make builds work on mac.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, 
 AVRO-1325-v4.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-07-28 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722062#comment-13722062
 ] 

Scott Carey commented on AVRO-1325:
---

committed in revision 1507862.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, 
 AVRO-1325-v4.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-07-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710349#comment-13710349
 ] 

Scott Carey commented on AVRO-1325:
---

I will commit this if there are no objections by this time tomorrow.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, 
 AVRO-1325-v4.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1349) Update site Javadoc to remove vulnerability

2013-06-20 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1349:
-

 Summary: Update site Javadoc to remove vulnerability
 Key: AVRO-1349
 URL: https://issues.apache.org/jira/browse/AVRO-1349
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Priority: Critical


see http://www.kb.cert.org/vuls/id/225657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (AVRO-1349) Update site Javadoc to remove vulnerability

2013-06-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey resolved AVRO-1349.
---

Resolution: Fixed

committed in revision 1495093.

Every instance of site/publish/docs/AVRO_VERSION/api/java/index.html

was modified.

 Update site Javadoc to remove vulnerability
 ---

 Key: AVRO-1349
 URL: https://issues.apache.org/jira/browse/AVRO-1349
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Priority: Critical

 see http://www.kb.cert.org/vuls/id/225657

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1213) Dependency on Jetty Servlet API in IPC

2013-06-08 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678919#comment-13678919
 ] 

Scott Carey commented on AVRO-1213:
---

Netty also now has HTTP support, so we may be able to consolidate significantly 
and use it for both.

 Dependency on Jetty Servlet API in IPC
 --

 Key: AVRO-1213
 URL: https://issues.apache.org/jira/browse/AVRO-1213
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.2
Reporter: Sharmarke Aden
Priority: Minor

 The compile scoped dependency on jetty servlet-api in the IPC pom file can be 
 problematic if using Avro in a webapp environment. Would it be possible to 
 make this dependency either optional or provided? Or maybe Avro modularize 
 into sub-modules in such a way that desired features can be assembled 
 piecemeal?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1335) ResolvingDecoder should provide bidirectional compatibility between different version of schemas

2013-05-21 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663119#comment-13663119
 ] 

Scott Carey commented on AVRO-1335:
---

Thanks for the clarification.  Is it safe to summarize the issue as C++ should 
support field default values?  Or are there other things that are also 
preventing bidirectional schema evolution use cases besides default values?


I cannot provide a time-frame for this, the volunteers who build and maintain 
the C++ Avro code may have more information.

 ResolvingDecoder should provide bidirectional compatibility between different 
 version of schemas
 

 Key: AVRO-1335
 URL: https://issues.apache.org/jira/browse/AVRO-1335
 Project: Avro
  Issue Type: Improvement
  Components: c++
Affects Versions: 1.7.4
Reporter: Bin Guo

 We found that resolvingDecoder could not provide bidirectional compatibility 
 between different version of schemas.
 Especially for records, for example:
 {code:title=First schema}
 {
 type: record,
 name: TestRecord,
 fields: [
 {
 name: MyData,
   type: {
   type: record,
   name: SubData,
   fields: [
   {
   name: Version1,
   type: string
   }
   ]
   }
 },
   {
 name: OtherData,
 type: string
 }
 ]
 }
 {code}
 {code:title=Second schema}
 {
 type: record,
 name: TestRecord,
 fields: [
 {
 name: MyData,
   type: {
   type: record,
   name: SubData,
   fields: [
   {
   name: Version1,
   type: string
   },
   {
   name: Version2,
   type: string
   }
   ]
   }
 },
   {
 name: OtherData,
 type: string
 }
 ]
 }
 {code}
 Say, node A knows only the first schema and node B knows the second schema, 
 and the second schema has more fields. 
 Any data generated by node B can be resolved by first schema 'cause the 
 additional field is marked as skipped.
 But data generated by node A can not be resolved by second schema and throws 
 an exception *Don't know how to handle excess fields for reader.*
 This is because data is resolved exactly according to the auto-generated 
 codec_traits which trying to read the excess field.
 The problem is we just can not only ignore the excess field in record, since 
 the data after the troublesome record also needs to be resolved.
 Actually this problem stucked us for a very long time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1311:
--

Attachment: AVRO-1311.patch

snappy-java 1.0.5 is out, with fixes to Java 7 on mac.
http://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.0.5/

The attached patch ups the version from 1.0.4 to 1.0.5.

I will commit this soon if there are no objections.

 Upgrade Snappy-Java dependency to support building on Mac + Java 7
 --

 Key: AVRO-1311
 URL: https://issues.apache.org/jira/browse/AVRO-1311
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.7.4
Reporter: Scott Carey
Assignee: Scott Carey
 Attachments: AVRO-1311.patch


 snappy-java 1.0.4 does not work with Mac + Java 7.  1.0.5-M4 is on maven, but 
 it does not appear that there will be a final release of that.  1.1.0 is at 
 -M3 status, and is being developed now.  
 Both of these work locally for me, when the dust settles we need to pick one 
 before the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1334:
-

 Summary: Java: update dependencies for 1.7.5
 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


A report for mvn versions:display-property-updates on trunk --

[INFO] The following version properties are referencing the newest available 
version:
[INFO]   ${jetty.version} . 6.1.26
[INFO]   ${javacc-plugin.version}  2.6
[INFO]   ${velocity.version} . 1.7
[INFO]   ${exec-plugin.version}  1.2.1
[INFO] The following version property updates are available:
[INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
[INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
[INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
[INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
[INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
[INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
[INFO]   ${commons-compress.version}  1.4.1 - 1.5
[INFO]   ${plugin-plugin.version} . 2.9 - 3.2
[INFO]   ${javadoc-plugin.version}  2.8 - 2.9
[INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
[INFO]   ${jopt-simple.version} ... 4.1 - 4.4
[INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
[INFO]   ${paranamer.version} ... 2.3 - 2.5.2
[INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
[INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
[INFO]   ${shade-plugin.version} .. 1.5 - 2.1
[INFO]   ${junit.version} ... 4.10 - 4.11


Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Description: 
A report for mvn versions:display-property-updates on trunk --

{noformat}
[INFO] The following version properties are referencing the newest available 
version:
[INFO]   ${jetty.version} . 6.1.26
[INFO]   ${javacc-plugin.version}  2.6
[INFO]   ${velocity.version} . 1.7
[INFO]   ${exec-plugin.version}  1.2.1
[INFO] The following version property updates are available:
[INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
[INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
[INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
[INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
[INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
[INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
[INFO]   ${commons-compress.version}  1.4.1 - 1.5
[INFO]   ${plugin-plugin.version} . 2.9 - 3.2
[INFO]   ${javadoc-plugin.version}  2.8 - 2.9
[INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
[INFO]   ${jopt-simple.version} ... 4.1 - 4.4
[INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
[INFO]   ${paranamer.version} ... 2.3 - 2.5.2
[INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
[INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
[INFO]   ${shade-plugin.version} .. 1.5 - 2.1
[INFO]   ${junit.version} ... 4.10 - 4.11
{noformat

Consider upgrades for these as well as the Apache parent and build plugins.

  was:
A report for mvn versions:display-property-updates on trunk --

[INFO] The following version properties are referencing the newest available 
version:
[INFO]   ${jetty.version} . 6.1.26
[INFO]   ${javacc-plugin.version}  2.6
[INFO]   ${velocity.version} . 1.7
[INFO]   ${exec-plugin.version}  1.2.1
[INFO] The following version property updates are available:
[INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
[INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
[INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
[INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
[INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
[INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
[INFO]   ${commons-compress.version}  1.4.1 - 1.5
[INFO]   ${plugin-plugin.version} . 2.9 - 3.2
[INFO]   ${javadoc-plugin.version}  2.8 - 2.9
[INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
[INFO]   ${jopt-simple.version} ... 4.1 - 4.4
[INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
[INFO]   ${paranamer.version} ... 2.3 - 2.5.2
[INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
[INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
[INFO]   ${shade-plugin.version} .. 1.5 - 2.1
[INFO]   ${junit.version} ... 4.10 - 4.11


Consider upgrades for these as well as the Apache parent and build plugins.


 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are 

[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Description: 
A report for mvn versions:display-property-updates on trunk --

{noformat}
[INFO] The following version properties are referencing the newest available 
version:
[INFO]   ${jetty.version} . 6.1.26
[INFO]   ${javacc-plugin.version}  2.6
[INFO]   ${velocity.version} . 1.7
[INFO]   ${exec-plugin.version}  1.2.1
[INFO] The following version property updates are available:
[INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
[INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
[INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
[INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
[INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
[INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
[INFO]   ${commons-compress.version}  1.4.1 - 1.5
[INFO]   ${plugin-plugin.version} . 2.9 - 3.2
[INFO]   ${javadoc-plugin.version}  2.8 - 2.9
[INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
[INFO]   ${jopt-simple.version} ... 4.1 - 4.4
[INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
[INFO]   ${paranamer.version} ... 2.3 - 2.5.2
[INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
[INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
[INFO]   ${shade-plugin.version} .. 1.5 - 2.1
[INFO]   ${junit.version} ... 4.10 - 4.11
{noformat}

Consider upgrades for these as well as the Apache parent and build plugins.

  was:
A report for mvn versions:display-property-updates on trunk --

{noformat}
[INFO] The following version properties are referencing the newest available 
version:
[INFO]   ${jetty.version} . 6.1.26
[INFO]   ${javacc-plugin.version}  2.6
[INFO]   ${velocity.version} . 1.7
[INFO]   ${exec-plugin.version}  1.2.1
[INFO] The following version property updates are available:
[INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
[INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
[INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
[INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
[INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
[INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
[INFO]   ${commons-compress.version}  1.4.1 - 1.5
[INFO]   ${plugin-plugin.version} . 2.9 - 3.2
[INFO]   ${javadoc-plugin.version}  2.8 - 2.9
[INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
[INFO]   ${jopt-simple.version} ... 4.1 - 4.4
[INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
[INFO]   ${paranamer.version} ... 2.3 - 2.5.2
[INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
[INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
[INFO]   ${shade-plugin.version} .. 1.5 - 2.1
[INFO]   ${junit.version} ... 4.10 - 4.11
{noformat

Consider upgrades for these as well as the Apache parent and build plugins.


 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version 

[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662552#comment-13662552
 ] 

Scott Carey commented on AVRO-1334:
---

Jackson requires an upgrade because there is currently a bug in Avro due to it 
(I ran into it in AVRO-1325, unit tests there fail without Jackson 1.9.12.

Hadoop1:  I am not sure what the best version here is -- 0.20.205 feels a bit 
old.  Suggestions?

Jopt-simple is a low risk update.
Paranamer is a low risk update (only a handful of bugfixes).
sfl4j looks safe to update (performance improvements, bug fixes, and now 
compiled against Java 1.5 target).
Junit is safe to update.

netty -- netty 3.6.6.GA should be compatible (see 
http://netty.io/news/index.html) and has many fixes / enhancements.
  (an aside, netty now supports HTTP, so perhaps we can drop the ancient Jetty 
version we use and rely on netty for both raw and http to simplify things 
later?)

The remainder are plugin updates, which are generally safe since testing them 
is easy to cover.

I'll submit a patch with the updates shortly.


 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Priority: Minor  (was: Major)

 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Priority: Minor  (was: Major)

 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1335) ResolvingDecoder should provide bidirectional compatibility between different version of schemas

2013-05-20 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662642#comment-13662642
 ] 

Scott Carey commented on AVRO-1335:
---

The second record must specify a default value for the added field:

{code}

{
name: Version2,
type: string,
default:
}
{code}

Otherwise, when reading with the second schema, and there is no data for field 
Version2 in the data written with the first schema, what do you want it to do?

The Avro specification uses default values to handle the use case where a field 
is present for the reader but not the writer.


 ResolvingDecoder should provide bidirectional compatibility between different 
 version of schemas
 

 Key: AVRO-1335
 URL: https://issues.apache.org/jira/browse/AVRO-1335
 Project: Avro
  Issue Type: Improvement
  Components: c++
Affects Versions: 1.7.4
Reporter: Bin Guo

 We found that resolvingDecoder could not provide bidirectional compatibility 
 between different version of schemas.
 Especially for records, for example:
 {code:title=First schema}
 {
 type: record,
 name: TestRecord,
 fields: [
 {
 name: MyData,
   type: {
   type: record,
   name: SubData,
   fields: [
   {
   name: Version1,
   type: string
   }
   ]
   }
 },
   {
 name: OtherData,
 type: string
 }
 ]
 }
 {code}
 {code:title=Second schema}
 {
 type: record,
 name: TestRecord,
 fields: [
 {
 name: MyData,
   type: {
   type: record,
   name: SubData,
   fields: [
   {
   name: Version1,
   type: string
   },
   {
   name: Version2,
   type: string
   }
   ]
   }
 },
   {
 name: OtherData,
 type: string
 }
 ]
 }
 {code}
 Say, node A knows only the first schema and node B knows the second schema, 
 and the second schema has more fields. 
 Any data generated by node B can be resolved by first schema 'cause the 
 additional field is marked as skipped.
 But data generated by node A can not be resolved by second schema and throws 
 an exception *Don't know how to handle excess fields for reader.*
 This is because data is resolved exactly according to the auto-generated 
 codec_traits which trying to read the excess field.
 The problem is we just can not only ignore the excess field in record, since 
 the data after the troublesome record also needs to be resolved.
 Actually this problem stucked us for a very long time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1311) Java: Upgrade snappy-java dependency to 1.0.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1311:
--

Summary: Java: Upgrade snappy-java dependency to 1.0.5  (was: Upgrade 
Snappy-Java dependency to support building on Mac + Java 7)

 Java: Upgrade snappy-java dependency to 1.0.5
 -

 Key: AVRO-1311
 URL: https://issues.apache.org/jira/browse/AVRO-1311
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.7.4
Reporter: Scott Carey
Assignee: Scott Carey
 Attachments: AVRO-1311.patch


 snappy-java 1.0.4 does not work with Mac + Java 7.  1.0.5-M4 is on maven, but 
 it does not appear that there will be a final release of that.  1.1.0 is at 
 -M3 status, and is being developed now.  
 Both of these work locally for me, when the dust settles we need to pick one 
 before the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (AVRO-1311) Java: Upgrade snappy-java dependency to 1.0.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey resolved AVRO-1311.
---

   Resolution: Fixed
Fix Version/s: 1.7.5

committed @ r1484656

 Java: Upgrade snappy-java dependency to 1.0.5
 -

 Key: AVRO-1311
 URL: https://issues.apache.org/jira/browse/AVRO-1311
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.7.4
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1311.patch


 snappy-java 1.0.4 does not work with Mac + Java 7.  1.0.5-M4 is on maven, but 
 it does not appear that there will be a final release of that.  1.1.0 is at 
 -M3 status, and is being developed now.  
 Both of these work locally for me, when the dust settles we need to pick one 
 before the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Attachment: AVRO-1334.patch

This patch updates versions of plugins and many dependencies.  Of note:

Netty version 3.6.6 was causing a deadlock in unit tests every time for me in 
TestNettyServerWithCallbacks (mac, java7).  All versions of 3.4.x and 3.5.x 
including the current version hang about 15% of the time in 
TestNettyTransceiverWhenServerStops.   I upgraded to the latest in the 3.5.x 
series.

I cleaned up some version consistency in a few places, and the JUnit/Hamcrest 
relationship has changed a little.

The newer maven plugin versions triggered deprecations in the maven plugins, so 
I updated those trivially. 

 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1334.patch


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1334:
--

Status: Patch Available  (was: Open)

 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1334.patch


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662664#comment-13662664
 ] 

Scott Carey commented on AVRO-1334:
---

And lastly, the TestIDL had to be changed since the Jackson upgrade changed the 
whitespace in pretty print slightly (an empty array is now [] instead of 
[\n], so I made the test insensitive to whitespace.

 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1334.patch


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5

2013-05-20 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662670#comment-13662670
 ] 

Scott Carey commented on AVRO-1334:
---

Dependencies I did not update that I defer to the expertise of others (a.k.a. I 
have no idea what the best thing to do is).

hadoop1
hadoop2
thrift
protobuf


 Java: update dependencies for 1.7.5
 ---

 Key: AVRO-1334
 URL: https://issues.apache.org/jira/browse/AVRO-1334
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1334.patch


 A report for mvn versions:display-property-updates on trunk --
 {noformat}
 [INFO] The following version properties are referencing the newest available 
 version:
 [INFO]   ${jetty.version} . 6.1.26
 [INFO]   ${javacc-plugin.version}  2.6
 [INFO]   ${velocity.version} . 1.7
 [INFO]   ${exec-plugin.version}  1.2.1
 [INFO] The following version property updates are available:
 [INFO]   ${jackson.version} .. 1.8.8 - 1.9.11
 [INFO]   ${source-plugin.version} . 2.1.2 - 2.2.1
 [INFO]   ${jar-plugin.version} .. 2.3.2 - 2.4
 [INFO]   ${snappy.version} . 1.0.5 - 1.1.0-M3
 [INFO]   ${checkstyle-plugin.version}  2.8 - 2.10
 [INFO]   ${hadoop1.version} .. 0.20.205.0 - 1.1.2
 [INFO]   ${commons-compress.version}  1.4.1 - 1.5
 [INFO]   ${plugin-plugin.version} . 2.9 - 3.2
 [INFO]   ${javadoc-plugin.version}  2.8 - 2.9
 [INFO]   ${compiler-plugin.version} . 2.3.2 - 3.1
 [INFO]   ${jopt-simple.version} ... 4.1 - 4.4
 [INFO]   ${surefire-plugin.version} ... 2.12 - 2.14.1
 [INFO]   ${paranamer.version} ... 2.3 - 2.5.2
 [INFO]   ${netty.version}  3.4.0.Final - 4.0.0.Alpha8
 [INFO]   ${slf4j.version} . 1.6.4 - 1.7.5
 [INFO]   ${shade-plugin.version} .. 1.5 - 2.1
 [INFO]   ${junit.version} ... 4.10 - 4.11
 {noformat}
 Consider upgrades for these as well as the Apache parent and build plugins.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659307#comment-13659307
 ] 

Scott Carey commented on AVRO-1245:
---

Another possibility is to do it all at builder construction time -- there is 
less to do when you are guaranteed a clean-slate, and the existing record must 
be walked at least once for a deep copy anyway.
{code}
boolean replaceNullsWithDefaults = true;
boolean replaceEmptyStringsWithDefaults = true;
User.newBuilder(thirdPartyRedord, replaceNullsWithDefaults, 
replaceEmptyStringsWithDefaults);
{code}

This bears resemblance to one of our other ideas -- that there is no 'read' and 
'write', only 'from' and 'to' -- a deep copy is like serializing 'from' one 
object 'to' another (rather than to binary, etc).  Replacing values is a 
special case of schema resolution when translating data from one object to 
another.

 Add Merging Functionality to Generated Builders
 ---

 Key: AVRO-1245
 URL: https://issues.apache.org/jira/browse/AVRO-1245
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.3
 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3
Reporter: Sharmarke Aden
Priority: Minor

 Suppose I have a record with the following schema and default values: 
 {code}
 {
 type: record,
 namespace: test,
 name: User,
 fields: [
 {
 name: user,
 type: [null, string],
 default: null
 },
 {
 name: privacy,
 type: [
 {
 type: enum,
 name: Privacy,
 namespace: test,
 symbols: [Public, Private]
 },
 null
 ],
 default: Private
 }
 ]
 }
 {code}
 Now suppose I have a record supplied to me by a third party whose privacy 
 field value is null. Currently if you call 
 Builder.newBuilder(thirdPartyRecord) it simply creates a new record with 
 same values as the source record (privacy is null in the newly created 
 builder). 
 It's very important that the privacy value be set and so ideally I would like 
 to perform a merge to mitigate any issues with default values being absent in 
 the source record. I would like to propose that a new enhancement be added to 
 the Builder to support merging of a source record to a new record. Perhaps 
 something like this:
 {code}
 // recordWithoutDefaults record passed in.
 User.Builder builder = User.newBuilder();
 //ignore null values in the source record if the schema has a default 
 //value for the field
 boolean ignoreNull = true;
 //ignore empty string values in the source record for string field 
 //types with default field values
 boolean ignoreEmptyString = true;
 //while this is simple and useful in my use-case perhaps there's a
 //better/refined way of supporting veracious merging models
 builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659893#comment-13659893
 ] 

Scott Carey commented on AVRO-1315:
---

Chrisophe:

I'll add some factory methods for creating composite validators, or similar, to 
address your use case for composing custom validations with these.

Tom:  
I'll add tests that explicitly test the positive case -- the positive cases are 
all covered now on the path to failure.
Re: serialVersionUID -- any Serializable class without one can be considered a 
bug.  Eclipse gives me a warning without it.  Its unlikely that a user will use 
Java serialization for these exceptions, but there is a chance.

 Java: Schema Validation utilities
 -

 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1315.patch


 As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
 those out of that ticket as a stand-alone item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659923#comment-13659923
 ] 

Scott Carey commented on AVRO-1325:
---

The field syntax could have shortcuts too -- since the FieldsBuilder currently 
has only two methods (name() and endRecord()), we could add a few shortcut 
methods for the most common use cases.

{quote}
Why have type() and type(Schema)?
{quote}
{code}
Schema person = new Schema.Parser().parse(Person.avsc);  // or lookup from 
schema repo

Schema job = new SchemaBuilder.record(Job).fields()
  .name(title).type().stringType().noDefault()
  .name(who).type(person).noDefault()
  .endRecord();

Schema meeting = new SchemaBuilder.record(Meeting).fields()
  .name(location).type().stringType().noDefault()
  .name(attendees).type().array().items(person).noDefault()
  .endRecord();
{code}


{quote}Error type is missing, but this can be easily added.{quote}  
I wondered what to do here, since Error is for Protocols only, perhaps add it 
when we add a ProtocolBuilder or extend SchemaBuilder to support protocols?

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659927#comment-13659927
 ] 

Scott Carey commented on AVRO-1325:
---

That brings up one more thing about this design:  I had in mind later 
supporting Protocols, and the nested type-parameterized context (CompletionR) 
allows a ProtocolBuilder to nest a SchemaBuilder -- a TypeBuilderProtocol 
would share API and code. 

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659943#comment-13659943
 ] 

Scott Carey commented on AVRO-1315:
---

The issue is that the default value is different on different JVMs/compilers, 
and that we break compatibility if there are any changes, even if they don't 
have any effect -- such as adding an additional constructor.  I felt that we 
would be more likely to cause the default value to change than to make a change 
that influenced serialization on such a trivial exception type.

Low risk no matter which way we go; I'd rather focus our energies elsewhere.  
I'll remove it from the next patch.


 Java: Schema Validation utilities
 -

 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1315.patch


 As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
 those out of that ticket as a stand-alone item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-16 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659945#comment-13659945
 ] 

Scott Carey commented on AVRO-1325:
---

Another option would be to have record() and recordSimple() on this API -- the 
latter could return a record builder with simpler syntax but lacking support 
for some things.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API

2013-05-15 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1325:
--

Attachment: AVRO-1325.patch

Updated patch contains:

* Completed functionality.
* Cleaned up API -- 
**  intWith() now intBuilder()
**  added nullable() and optional() shortcut builders
**  reduce number of methods named 'type' for the field builder (default values 
are set afterwards)
* Very large increase in javadoc
* Unit test coverage at 99.9% instruction coverage


 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API

2013-05-15 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1325:
--

Status: Patch Available  (was: Open)

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API

2013-05-15 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1325:
--

Attachment: AVRO-1325-v2.patch

This patch includes an additional 180 lines of javadoc introduction to 
SchemaBuilder.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-15 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658758#comment-13658758
 ] 

Scott Carey commented on AVRO-1325:
---

My most recent patch deals with the annoying 'everything has a property' case 
as follows:
{code}
Schema schema = SchemaBuilder.record(Rec).prop(recProp, r).fields()
  .name(locations).prop(fieldProp, f).map().prop(mapProp,m).values()
.stringBuilder().prop(valProp,v).endString()
  .endRecord()
{code}

The example from the first comment, based on the schema in the Avro spec page:
{code}
Schema schema = 
SchemaBuilder.record(HandshakeRequest).namespace(org.apache.avro.ipc).fields()
  .name(clientHash).type().fixed(MD5).size(16).noDefault() // namespace is 
inherited
  .name(clientProtocol).type().nullable().stringBuilder()  // nullable() is 
union of type and null
.prop(avro.java.string, String).endString().noDefault()
  .name(serverHash).type(MD5).noDefault()  //reference by name
  .name(meta).type().optional().map().prop(avro.java.string, 
String).values().bytesType() // optional is union of null and type with null 
default
  .endRecord();
{code}


 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, 
 AVRO-1325-properties.patch, AVRO-1325-v2.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins

2013-05-15 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1314:
--

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

 Java: Add @threadSafe annotation to maven plugins
 -

 Key: AVRO-1314
 URL: https://issues.apache.org/jira/browse/AVRO-1314
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1314.patch


 Our plugins are thread-safe, mark them as much so that warnings will not be 
 printed when running parallel maven builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins

2013-05-15 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1314:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed this @ r1483078

 Java: Add @threadSafe annotation to maven plugins
 -

 Key: AVRO-1314
 URL: https://issues.apache.org/jira/browse/AVRO-1314
 Project: Avro
  Issue Type: Improvement
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1314.patch


 Our plugins are thread-safe, mark them as much so that warnings will not be 
 printed when running parallel maven builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch

2013-05-14 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657175#comment-13657175
 ] 

Scott Carey commented on AVRO-1310:
---

Martin: Are you using M2E or maven:eclipse ?

With M2E as long as eclipse is not rebuilding I can use the command line to do 
all but clean.  Both Eclipse and maven share the same output for compiled class 
files, so they can step on each-others toes but it is manageable.  This is the 
same experience I have with every maven project, Avro or otherwise, with M2E.
I haven't used the maven eclipse plugin in years, so I don't know much about 
that.

 Avro Maven project can't be built from scratch
 --

 Key: AVRO-1310
 URL: https://issues.apache.org/jira/browse/AVRO-1310
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
 Environment: Maven on Eclipse
Reporter: Nir Zamir
Priority: Minor

 When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn 
 install') there are errors.
 Most of the errors are in tests so I tried skipping the tests but it still 
 fails.
 See more details in my post on Avro Users: 
 http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-10 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654852#comment-13654852
 ] 

Scott Carey commented on AVRO-1325:
---

I'll need to clean up the doc a little w.r.t. complexity in a couple places.  

BaseTypeBuilder has three overloads for type(), FieldBuilder has three more due 
to defaults.
What do you think would be confusing to the causal user?  What could not be 
satisfied with javadoc?  I expect users to have IDEs set up with Maven so that 
javadoc is available at autocomplete/suggestion.  Are you concerned that will 
not be the case for most users?
We could rename the string name reference variants for FieldBuilder's to 
typeReference().   I named all of them type() because when writing the doc, it 
was easy to say:
{code}
  /**
   * Builds a Field in the context of a {@link FieldAssembler}.
   * 
   * Usage is to first configure any of the optional parameters and then to 
call one
   * of the type methods to complete the field.  For example
   * pre
   *   .namespace(org.apache.example).orderDecending().type()
   * /pre
   * Optional parameters for a field are namespace, doc, order, and aliases.
   */
{code}

We could change the name of the ones that select a name by reference to 
typeRef, or remove the ones that select default values and instead force the 
user to call an additional noDefaults() or withDefault() method afterwards to 
reduce the number of methods named 'type'.

For a generic type builder, used by map and array, values() and items() returns 
a builder that has three variants of type() on it, these could be rolled in to 
the map and array instead:
{code}
  map().values().intType()
  map().values().type(MD5)
  map().values().type(someSchema)
{code}
we could have:
{code}
  map().values().intType()
  map().values(MD5)
  map().values(someSchema)
{code}

I did not do this because it shared more code and consistency --  SchemaBuilder 
itself has the same API.

{quote}
What is the difference between intWith() and intType()
{quote}
I need feedback/suggestions on naming and API here.
The javadoc for intType would say select an int type without custom 
properties.  A shortcut for intWith().endInt().  The javadoc for intWith would 
say return an int type builder for creating an int with properties. If 
properties are not required use the #intType() shortcut.

intWith() exists only for the case where you need to add a property to the int, 
which is uncommon, so I wanted a shortcut for the common case.  I did not want 
to have the context for optional properties to bleed into the following context 
after type selection, since that adds extra methods to the later context and 
applies to doc() and namespace() as well.  After selecting a type, the context 
either returns to an earlier scope or to a field default selection.  In the 
former case, properties are ambiguous with the outer scope (a field, array, 
map, or record context).  In the latter case, having a prop() method in scope 
is not applicable to default value selection.

I decided to have the methods available in any scope be unambiguous and 
correspond with the JSON declaration and the spec.  This reduces how many 
methods are available in all contexts significantly from the current version in 
trunk and is intended to prevent user error.

Alternatively we could name intWith() to intBuilder(), or intWithProps(), 
or change intType() to be the builder, and have intSimple() for the shortcut, 
but I wanted the common case to be the most obvious.  I think the naming and 
documentation here could certainly be improved.   Making it clear to a user 
which intXYZ (and the similar methods for all other primitive types) is 
critical.  The vast majority of the time users will not need to set properties 
on primitive types.


{quote}
I think this is easy to add to the existing SchemaBuilder by adding an 
addProp(String key, String value) method to the builders (RecordBuilder, 
FieldBuilder, ArrayBuilder). For FieldBuilder we could also have a 
addPropToType(String key, String value) method to distinguish between 
properties that are added to the underlying type, not the field itself.
{quote}
It is complicated and confusing to scope properties correctly.  The context for 
setting a property is easily ambiguous in all cases where it is not contained 
in its own scope.   For example:
{code}
{type:record, name:Rec, recProp:r, fields: [
  {name:locations, fieldProp:f, type: {type:map, mapProp:m, 
values:{type:string, valProp:v}}}
]}
{code}
What would this look like with the API in trunk now if extended to support 
properties?  I had trouble reasoning about making it obvious to the user when 
the field's type contexts bled into the record context.  If you add the ability 
to chain the builder to build a nested type in the map, it gets even more 
messy.  The scoping and nesting allows for chaining, and thus propagation of 
default 

[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-10 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654909#comment-13654909
 ] 

Scott Carey commented on AVRO-1325:
---

{quote}
What is the difference between intWith() and intType()
{quote}

Dealing with properties or not could be as follows for primitive types.
JSON:
{code}
{type:map, values:{type:string, avro.java.string:String}}
{type:map, values:{type:string}} // same as {type:map, 
values:string}
{code}
trunk:
{code}
  Schema strWithProps = Schema.create(Schema.Type.STRING); 
strWithProps.addProp(avro.java.string, String); 
SchemaBuilder.mapType(strWithProps);
  SchemaBiulder.mapType(SchemaBuilder.STRING);
{code}

Current proposal:
{code}
  SchemaBuilder.map().values().stringWith().prop(avro.java.string).endInt();
  SchemaBuilder.map().values().stringType();
{code}

Alternative:
{code}
  SchemaBuilder.map().values().stringType().prop(avro.java.string, 
String).endString();
  SchemaBuilder.map().values().stringType().endString();
{code}

The benefit of the alternative is fewer methods on the type builder -- only one 
for each primitive type.  The drawback is the common case --  no props -- 
requires another method call to close the property setting context.  

Another alternative is to significantly increase the number of methods 
available on the map for setting values to shorten the common case:
{code}
  SchemaBuilder.map().values().stringType().prop(avro.java.string, 
String).endString();
  SchemaBuilder.map().valuesString();  // looks a lot like {type:map, 
values:string}
{code}
'valuesString()' is shortcut for 'values().stringType().endString()', and it 
also looks a lot like the json.
I did not like this variation because it adds a lot of methods to the array and 
map cases and makes their APIs differ more.  It is also less consitent with 
unions, and I wanted to have unions, maps, arrays, and fields be similar in API 
look/feel where possible.  Fewer public methods are also easier to document. :)

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325-preliminary.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-09 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653062#comment-13653062
 ] 

Scott Carey commented on AVRO-1316:
---

We could make it so that 1.7.4 code can read classes generated with 1.7.5.

If the method that takes the split strings and merges them into one with the 
string buffer before parsing is inside the generated class rather than 
Schema.Parser, the change would be two-way compatible.  This is not quite as 
elegant however, and I think the requirement to run code generated by 1.7.5 
with 1.7.5 is reasonable.

 IDL code-generation generates too-long literals for very large schemas
 --

 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: patch
 Fix For: 1.7.5

 Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, 
 AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch


 When I work from a very large IDL schema, the Java code generated includes a 
 schema JSON literal that exceeds the length of the maximum allowed literal 
 string ([65535 
 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
   
 This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
 constant string too long}}.
 It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
 outrageous at all for some of the more involved data structures, especially 
 if we're including documentation strings etc.
 I believe the fix should be a bit more sensitivity to the length of the JSON 
 literal (and a willingness to split it into more than one literal, joined by 
 {{+}}), but I haven't figured out where that change needs to go. Has anyone 
 else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-07 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651254#comment-13651254
 ] 

Scott Carey commented on AVRO-1316:
---

Looks good.  +1


 IDL code-generation generates too-long literals for very large schemas
 --

 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Jeremy Kahn
Assignee: Jeremy Kahn
Priority: Minor
  Labels: patch
 Fix For: 1.7.5

 Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, 
 AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch


 When I work from a very large IDL schema, the Java code generated includes a 
 schema JSON literal that exceeds the length of the maximum allowed literal 
 string ([65535 
 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
   
 This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
 constant string too long}}.
 It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
 outrageous at all for some of the more involved data structures, especially 
 if we're including documentation strings etc.
 I believe the fix should be a bit more sensitivity to the length of the JSON 
 literal (and a willingness to split it into more than one literal, joined by 
 {{+}}), but I haven't figured out where that change needs to go. Has anyone 
 else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1325) Enhanced Schema Builder API

2013-05-07 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1325:
-

 Summary: Enhanced Schema Builder API
 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API

2013-05-07 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651288#comment-13651288
 ] 

Scott Carey commented on AVRO-1325:
---

Below are the limitations that concern me from AVRO-1274, in approximate 
priority of my concern.

# Arbitrary properties are not supported, for example {type:string, 
avro.java.string:String} can not be built.
# SchemaBuilder.INT and other constants are public.  Unfortunately, these are 
mutable, and anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a 
# Does not chain to nested types.  Although there is limited chaining for 
record fields, nested calls to the builder are required which prevents 
supporting namespace nesting or other passing of context from outer to inner 
scopes.


I have a prototype patch that builds on the work in AVRO-1274.  The major 
changes are to how scopes are handled for fields and unions, since adding 
property support is not trivial on top of AVRO-1274 because there is much 
ambiguity in what a call to add a property would apply to (the field, or the 
type of the field?)

The following schema:
{code:json}
  
{type:record,name:HandshakeRequest,namespace:org.apache.avro.ipc,fields:[
{name:clientHash,type:{type:fixed,name:MD5,size:16}},
{name:clientProtocol,type:[
  null,
  {type:string,avro.java.string:String}]},
{name:serverHash,type:MD5},
{name:meta,type:[
  null,
  {type:map,values:bytes,avro.java.string:String}]}
  ]}
{code}
looks like this in the builder:
{code}
  Schema result = SchemaBuilder
.recordType(HandshakeRequest).namespace(org.apache.avro.ipc).fields()
  .name(clientHash).type().fixed(MD5).size(16).noDefault()
  .name(clientProtocol).type().unionOf()
.nullType().and()
.stringWith().prop(avro.java.string, 
String).endString().endUnion().noDefault()
  .name(serverHash).type(MD5)
  .name(meta).type().unionOf()
.nullType().and()
.map().prop(avro.java.string, 
String).values().bytesType().endUnion().withDefault(null)
  .record();
{code}

It supports the same feature set that JSON schemas do:
  * nesting of namespaces (MD5 above automatically picks up the 
org.apache.avro.ipc namespace)
  * reference of named types by name .type(MD5) above for serverHash
And enforces other rules:
  * union defaults are required to be the same as the first type in the union
  * properties, doc(), namespace, and aliases work only in the contexts that 
they are supported. 

Supported features are scoped with many internal nested types, for example, the 
field assembler returned by the record builder's fields() method has only two 
methods -- name(String) and record(), and the type builder that name(String) 
returns type builder for a field, which has prop(String, String) for the field 
and the available types, such as map().  A call to map() returns a map builder, 
which has prop(String, String) again but for the map, and values() ends the use 
of the map builder, changing scope to the nested type and returning down to the 
fields assembler when that is complete. 


h4. Remaining Work
* All primitive types are not supported yet (trivial)
* Shortcut methods need to be added for common use cases such as an optional 
field.
* Naming of some things needs review -- it would be easier if enum, int, long, 
default, etc were not reserved java key words :)
* Javadoc is nearly absent.
* There is some room for pushing more common work into parent types.
* Tests
* Attempt to replace the Schema.Parser logic with it, at minimum to test for 
areas of improvement or missing features.
* No protocol support yet (e.g. error, protocol, request, response).  It 
probably makes sense to extend this to cover all Avro things, including fields 
and protocols.

I want to checkpoint the work so far and gather feedback.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (AVRO-1325) Enhanced Schema Builder API

2013-05-07 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651288#comment-13651288
 ] 

Scott Carey edited comment on AVRO-1325 at 5/7/13 9:10 PM:
---

Below are the limitations that concern me from AVRO-1274, in approximate 
priority of my concern.

# Arbitrary properties are not supported, for example {type:string, 
avro.java.string:String} can not be built.
# SchemaBuilder.INT and other constants are public.  Unfortunately, these are 
mutable, and anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a 
# Does not chain to nested types.  Although there is limited chaining for 
record fields, nested calls to the builder are required which prevents 
supporting namespace nesting or other passing of context from outer to inner 
scopes.


I have a prototype patch that builds on the work in AVRO-1274.  The major 
changes are to how scopes are handled for fields and unions, since adding 
property support is not trivial on top of AVRO-1274 because there is much 
ambiguity in what a call to add a property would apply to (the field, or the 
type of the field?)

The following schema:
{code}
  
{type:record,name:HandshakeRequest,namespace:org.apache.avro.ipc,fields:[
{name:clientHash,type:{type:fixed,name:MD5,size:16}},
{name:clientProtocol,type:[
  null,
  {type:string,avro.java.string:String}]},
{name:serverHash,type:MD5},
{name:meta,type:[
  null,
  {type:map,values:bytes,avro.java.string:String}]}
  ]}
{code}
looks like this in the builder:
{code}
  Schema result = SchemaBuilder
.recordType(HandshakeRequest).namespace(org.apache.avro.ipc).fields()
  .name(clientHash).type().fixed(MD5).size(16).noDefault()
  .name(clientProtocol).type().unionOf()
.nullType().and()
.stringWith().prop(avro.java.string, 
String).endString().endUnion().noDefault()
  .name(serverHash).type(MD5)
  .name(meta).type().unionOf()
.nullType().and()
.map().prop(avro.java.string, 
String).values().bytesType().endUnion().withDefault(null)
  .record();
{code}

It supports the same feature set that JSON schemas do:
  * nesting of namespaces (MD5 above automatically picks up the 
org.apache.avro.ipc namespace)
  * reference of named types by name .type(MD5) above for serverHash
And enforces other rules:
  * union defaults are required to be the same as the first type in the union
  * properties, doc(), namespace, and aliases work only in the contexts that 
they are supported. 

Supported features are scoped with many internal nested types, for example, the 
field assembler returned by the record builder's fields() method has only two 
methods -- name(String) and record(), and the type builder that name(String) 
returns type builder for a field, which has prop(String, String) for the field 
and the available types, such as map().  A call to map() returns a map builder, 
which has prop(String, String) again but for the map, and values() ends the use 
of the map builder, changing scope to the nested type and returning down to the 
fields assembler when that is complete. 


h4. Remaining Work
* All primitive types are not supported yet (trivial)
* Shortcut methods need to be added for common use cases such as an optional 
field.
* Naming of some things needs review -- it would be easier if enum, int, long, 
default, etc were not reserved java key words :)
* Javadoc is nearly absent.
* There is some room for pushing more common work into parent types.
* Tests
* Attempt to replace the Schema.Parser logic with it, at minimum to test for 
areas of improvement or missing features.
* No protocol support yet (e.g. error, protocol, request, response).  It 
probably makes sense to extend this to cover all Avro things, including fields 
and protocols.

I want to checkpoint the work so far and gather feedback.

  was (Author: scott_carey):
Below are the limitations that concern me from AVRO-1274, in approximate 
priority of my concern.

# Arbitrary properties are not supported, for example {type:string, 
avro.java.string:String} can not be built.
# SchemaBuilder.INT and other constants are public.  Unfortunately, these are 
mutable, and anyone could call addProp() on these, affecting others.
# Scopes are confusing, it is not always obvious when a 
# Does not chain to nested types.  Although there is limited chaining for 
record fields, nested calls to the builder are required which prevents 
supporting namespace nesting or other passing of context from outer to inner 
scopes.


I have a prototype patch that builds on the work in AVRO-1274.  The major 
changes are to how scopes are handled for fields and unions, since adding 
property support is not trivial on top of AVRO-1274 because there is much 
ambiguity in what a call to add a property would apply to 

[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API

2013-05-07 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1325:
--

Attachment: AVRO-1325-preliminary.patch

Preliminary work in progress -- mostly complete but requires more doc, tests, 
and feedback.

 Enhanced Schema Builder API
 ---

 Key: AVRO-1325
 URL: https://issues.apache.org/jira/browse/AVRO-1325
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1325-preliminary.patch


 The schema builder from AVRO-1274 has a few key limitations.  I have proposed 
 changes to make before it is released and the public API is locked in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1274) Add a schema builder API

2013-05-02 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647360#comment-13647360
 ] 

Scott Carey commented on AVRO-1274:
---

I am working on a modification to the builder that would make its use look like 
a json schema.

{code}
 public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse(
  
{\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[

{\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}},

{\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]},
{\name\:\serverHash\,\type\:\MD5\},

{\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]}
  ]});
{code}

becomes similar to:

{code}
  public static final org.apache.avro.Schema SCHEMA$ = SchemaBuilder

.typeRecord(HandshakeRequest).namespaceInherited(org.apache.avro.ipc).fields()//
 optional namespace inheritance
  .typeFixed(clientHash, MD5.SCHEMA$).field()   // or 
typeFixed(clientHash, MD5, 16)
  
.typeUnion(clientProtocol).ofNull().andString().withProp(avro.java.string, 
String).field()
  .typeFixed(serverHash, MD5).field() // uses reference to already 
defined MD5
  .typeUnion(meta).ofNull().andMap().withProp(avro.java.string, 
String).valuesBytes().field()
.record();
{code}

we can also have shortcuts as before, for example
optionalInt(x, -1) as a shortcut for typeUnion(x).ofInt(-1).andNull()

nullableInt(maybe) as a shortcut for typeUnion(maybe).ofNull(null).andInt()

requiredInt(yes) may not be necessary, its shortcut would be 
typeInt(yes).field();

It should be straightforward to implement the whole Schema.Parser with the 
above (and simplify the parser), which makes it easy to test very thoroughly; 
there is an intentional 1:1 mapping between the parser, spec, and the builder.

 Add a schema builder API
 

 Key: AVRO-1274
 URL: https://issues.apache.org/jira/browse/AVRO-1274
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Tom White
Assignee: Tom White
 Fix For: 1.7.5

 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 TestDefaults.patch


 It would be nice to have a fluent API that made it easier to construct record 
 schemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-02 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648047#comment-13648047
 ] 

Scott Carey commented on AVRO-1316:
---

The limit in the (Sun) java compiler is 64KB in encoded UTF-8 bytes, not 64K 
characters.  If any identifier is a multibyte utf8 character breaking at that 
boundary will fail.  We probably want to break at a smaller boundary than 2^16. 
 How about 2^14 (16KB)?



 IDL code-generation generates too-long literals for very large schemas
 --

 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.5
Reporter: Jeremy Kahn
Priority: Minor
  Labels: patch
 Attachments: AVRO-1316.patch


 When I work from a very large IDL schema, the Java code generated includes a 
 schema JSON literal that exceeds the length of the maximum allowed literal 
 string ([65535 
 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
   
 This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
 constant string too long}}.
 It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
 outrageous at all for some of the more involved data structures, especially 
 if we're including documentation strings etc.
 I believe the fix should be a bit more sensitivity to the length of the JSON 
 literal (and a willingness to split it into more than one literal, joined by 
 {{+}}), but I haven't figured out where that change needs to go. Has anyone 
 else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1274) Add a schema builder API

2013-05-02 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648048#comment-13648048
 ] 

Scott Carey commented on AVRO-1274:
---

I am planning on constraining the lexical scope via many cascaded builders / 
assemblers so that the list to auto-complete at any time is small.

I'll make a new JIRA for my proposed changes.

 Add a schema builder API
 

 Key: AVRO-1274
 URL: https://issues.apache.org/jira/browse/AVRO-1274
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Tom White
Assignee: Tom White
 Fix For: 1.7.5

 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 TestDefaults.patch


 It would be nice to have a fluent API that made it easier to construct record 
 schemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646760#comment-13646760
 ] 

Scott Carey commented on AVRO-1311:
---

1.0.5 may be released this week (based on -M4 see: 
https://github.com/xerial/snappy-java/issues/6)  but we may want to test it 
more first.

1.0.5-M4 works for me (OSX 10.7.5, Java 7).

Can some others test changing the snappy.version in lang/java/pom.xml to 
1.0.5-M4?
{code:xml}
 snappy.version1.0.5-M4/snappy.version
{code}


 Upgrade Snappy-Java dependency to support building on Mac + Java 7
 --

 Key: AVRO-1311
 URL: https://issues.apache.org/jira/browse/AVRO-1311
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.7.4
Reporter: Scott Carey
Assignee: Scott Carey

 snappy-java 1.0.4 does not work with Mac + Java 7.  1.0.5-M4 is on maven, but 
 it does not appear that there will be a final release of that.  1.1.0 is at 
 -M3 status, and is being developed now.  
 Both of these work locally for me, when the dust settles we need to pick one 
 before the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647037#comment-13647037
 ] 

Scott Carey commented on AVRO-607:
--

If I recall, that turns out to be very hard due to how the equals contract 
works with weak references.  There is already a Java WeakHashMap, so making one 
with identity semantics wasn't too hard.
We may need thousands of lines of code and might have to implement our own 
concurrent map implementation.  I think I'd rather spend my efforts figuring 
out how to extract Google's implementation into another namespace in the build 
with shade, jarjar, or similar.

 SpecificData.getSchema not thread-safe
 --

 Key: AVRO-607
 URL: https://issues.apache.org/jira/browse/AVRO-607
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.3
Reporter: Stephen Tu
Priority: Minor
 Attachments: AVRO-607.patch


 SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap 
 is not thread-safe, and the method itself is not synchronized. Seems like 
 this could lead to the data structure getting corrupted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647146#comment-13647146
 ] 

Scott Carey commented on AVRO-1310:
---

Thanks for the update!

Yes, the maven archetypes at the end have some extra dependencies.  Lets not 
close this quite yet, but I'll lower the priority -- I won't worry about it for 
the next release.

There may be something worth fixing in the archetype part of the build.  If 
not, then we can close this.

 Avro Maven project can't be built from scratch
 --

 Key: AVRO-1310
 URL: https://issues.apache.org/jira/browse/AVRO-1310
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
 Environment: Maven on Eclipse
Reporter: Nir Zamir

 When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn 
 install') there are errors.
 Most of the errors are in tests so I tried skipping the tests but it still 
 fails.
 See more details in my post on Avro Users: 
 http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1310) Avro Maven project can't be built from scratch

2013-05-01 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1310:
--

Priority: Minor  (was: Major)

 Avro Maven project can't be built from scratch
 --

 Key: AVRO-1310
 URL: https://issues.apache.org/jira/browse/AVRO-1310
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
 Environment: Maven on Eclipse
Reporter: Nir Zamir
Priority: Minor

 When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn 
 install') there are errors.
 Most of the errors are in tests so I tried skipping the tests but it still 
 fails.
 See more details in my post on Avro Users: 
 http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-05-01 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed in revision 1478244.

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1274) Add a schema builder API

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647196#comment-13647196
 ] 

Scott Carey commented on AVRO-1274:
---

We may have more work to do here. 

How would you use the builder to do the equivalent of:

{code}
  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse(
  
{\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[

{\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}},

{\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]},
{\name\:\serverHash\,\type\:\MD5\},

{\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]}
  ]});
{code}

?

I am trying to suggest that we replace literal strings with the builder in 
AVRO-1316 but cannot seem to repliate the above with the builder.

The clientProtocol and meta fields are the problem.  It does not seem 
possible to create a union of null and 'more' without a default.

Additionally, unionType is confusing.  Is this how it would be done?  If so, 
I do not see how to add types to the union if I start with:

{code}
unionType(clientProtocol, SchemaBuilder.NULL)
{code}
Then how do I add extra types?  Or is the type passed in expected to _be_ a 
union?  if so the field should be named unionSchema and the javadoc needs to be 
clear.

This builder API makes it hard to create union fields without defaults.  
Perhaps it is simply a documentation issue and the doc for unionType() needs an 
example.  

Should we open a new ticket for these concerns or re-open this one?  I suspect 
it is largely documentation but am not sure.

 Add a schema builder API
 

 Key: AVRO-1274
 URL: https://issues.apache.org/jira/browse/AVRO-1274
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Tom White
Assignee: Tom White
 Fix For: 1.7.5

 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 TestDefaults.patch


 It would be nice to have a fluent API that made it easier to construct record 
 schemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647205#comment-13647205
 ] 

Scott Carey commented on AVRO-1316:
---

I have not, but my schemas are only ~12K.

I assume the problem is in the creation of the SCHEMA$ static field?

We could break the string up into 4k chunks.

However it will be more efficient and significantly less resulting class file 
size if we use the Schema API programatically.

This isn't too hard.

We go from the below (edited from one line to many for readability):
{code}
  public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse(
  
{\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[

{\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}},

{\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]},
{\name\:\serverHash\,\type\:\MD5\},

{\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]}
  ]});
{code}

to use the new SchemaBuilder:
{code}
  public static final org.apache.avro.Schema SCHEMA$;
  static {
SCHEMA$ = SchemaBuilder
  .recordType(HandshakeRequest)
  .namespace(org.apache.avro.ipc)
  .requiredFixed(clientHash, MD5.SCHEMA$)
  .unionType(clientProtocol, SchemaBuilder.unionType(
  SchemaBuilder.NULL,
  SchemaBuilder.STRING)
  .build())
  .addProp(avro.java.string, String)
  .requiredFixed(serverHash, MD5.SCHEMA$)
  .unionType(meta, SchemaBuilder.unionType(
  SchemaBuilder.NULL,
  SchemaBuilder.mapType(SchemaBuilder.BYTES)
.addProp(avro.java.string, String)
.build())
  .build())
  .build();
  }
{code}


 IDL code-generation generates too-long literals for very large schemas
 --

 Key: AVRO-1316
 URL: https://issues.apache.org/jira/browse/AVRO-1316
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Jeremy Kahn
Priority: Minor

 When I work from a very large IDL schema, the Java code generated includes a 
 schema JSON literal that exceeds the length of the maximum allowed literal 
 string ([65535 
 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]).
   
 This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] 
 constant string too long}}.
 It might seem a little crazy, but a 64-kilobyte JSON protocol isn't 
 outrageous at all for some of the more involved data structures, especially 
 if we're including documentation strings etc.
 I believe the fix should be a bit more sensitivity to the length of the JSON 
 literal (and a willingness to split it into more than one literal, joined by 
 {{+}}), but I haven't figured out where that change needs to go. Has anyone 
 else encountered this problem?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1274) Add a schema builder API

2013-05-01 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647221#comment-13647221
 ] 

Scott Carey commented on AVRO-1274:
---

I think the answer to my question would be:

{code}
  public static final org.apache.avro.Schema SCHEMA$;
  static {
SCHEMA$ = SchemaBuilder
  .recordType(HandshakeRequest)
  .namespace(org.apache.avro.ipc)
  .requiredFixed(clientHash, MD5.SCHEMA$)
  .unionType(clientProtocol, SchemaBuilder.unionType(
  SchemaBuilder.NULL,
  SchemaBuilder.STRING)
  .build())
  .addFieldProp(avro.java.string, String)
  .requiredFixed(serverHash, MD5.SCHEMA$)
  .unionType(meta, SchemaBuilder.unionType(
  SchemaBuilder.NULL,
  SchemaBuilder.mapType(SchemaBuilder.BYTES)
.addFieldProp(avro.java.string, String)
.build())
  .build())
  .build();
  }
{code}

but I am not sure.  Also addFieldProp() does not exist.

What is odd is that there are two unionType() methods, one takes varargs and 
the other does not.  I suspect that the intention was for both to use varargs 
so that the nested union building is not required by the user.

It would be much simpler if unions without defaults had a shortcut:

{code}
  public static final org.apache.avro.Schema SCHEMA$;
  static {
SCHEMA$ = SchemaBuilder
  .recordType(HandshakeRequest)
  .namespace(org.apache.avro.ipc)
  .requiredFixed(clientHash, MD5.SCHEMA$)
  .nullableString(clientProtocol)
 .addFieldProp(avro.java.string, String)
  .requiredFixed(serverHash, MD5.SCHEMA$)
  .nullableMap(SchemaBuilder.BYTES)
.addFieldProp(avro.java.string, String)
  .build()
  }
{code}

Building unions in general feels clunky as well since you have to break 
chaining and use SchemaBuilder again.  Instead of taking a varargs list of 
schemas in the union, the type returned could be a UnionBuilder.  So instead of:
{code}
  public static final org.apache.avro.Schema SCHEMA$;
  static {
SCHEMA$ = SchemaBuilder
  .recordType(Test)
  .namespace(org.apache.avro)
  .unionString(stringField, defaultVal, 
 SchemaBuilder.INT,
 SchemaBuilder.arrayType(SchemaBuilder.INT).build()
 SchemaBuilder.mapType(SchemaBuilder.unionType(
   SchemaBuilder.INT, SchemaBuilderLONG)
   )
)
  .build()
  }
{code}

we could write something more like:
{code}
  public static final org.apache.avro.Schema SCHEMA$;
  static {
SCHEMA$ = SchemaBuilder
  .recordType(Test)
  .namespace(org.apache.avro)
  .unionString(stringFieldName, defaultVal)
 .andInt()
 .andArrayOf().int()
 .andMapOf().unionInt().andLong()
  .build()
  }
{code}

 Add a schema builder API
 

 Key: AVRO-1274
 URL: https://issues.apache.org/jira/browse/AVRO-1274
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Tom White
Assignee: Tom White
 Fix For: 1.7.5

 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, 
 TestDefaults.patch


 It would be nice to have a fluent API that made it easier to construct record 
 schemas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1282:
--

   Resolution: Fixed
Fix Version/s: 1.8.0
   1.7.5
   Status: Resolved  (was: Patch Available)

Committed @r1477712

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor
 Fix For: 1.7.5, 1.8.0

 Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, 
 AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, 
 AVRO-1282-s7.patch, avro-1282-v1.patch, avro-1282-v2.patch, 
 avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, 
 avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, 
 AVRO-1282-v9.patch, TestUnsafeUtil.java


 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Fix Version/s: 1.7.5

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Bug
Reporter: Scott Carey
 Fix For: 1.7.5


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Issue Type: Improvement  (was: Bug)

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
 Fix For: 1.7.5


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Attachment: AVRO-1313.patch

This patch adds the check for avro.disable.unsafe.

When -Davro.disable.unsafe is added to the command line, performance drops as 
expected for field access, but array performance is still fast:

Unsafe On:
{noformat}
   test name timeM 
entries/sec   M bytes/sec  bytes/cycle
 ReflectRecordRead:   7405 ms   2.25187.343 
   808498
ReflectRecordWrite:   4786 ms   3.482   135.121 
   808498
  ReflectBigRecordRead:   7478 ms   1.33782.089 
   767380
 ReflectBigRecordWrite:   4984 ms   2.006   123.153 
   767380
  ReflectFloatRead:   6927 ms   0.000   115.486 
  104
 ReflectFloatWrite:   1087 ms   0.001   735.371 
  104
 ReflectDoubleRead:   8678 ms   0.000   184.369 
  204
ReflectDoubleWrite:   2398 ms   0.000   666.980 
  204
   ReflectIntArrayRead:  11756 ms   1.41858.503 
   859709
  ReflectIntArrayWrite:   3798 ms   4.388   181.070 
   859709
  ReflectLongArrayRead:   6542 ms   1.27498.481 
   805344
 ReflectLongArrayWrite:   2189 ms   3.806   294.278 
   805344
ReflectDoubleArrayRead:   6316 ms   1.583   103.625 
   818144
   ReflectDoubleArrayWrite:   1589 ms   6.292   411.827 
   818144
 ReflectFloatArrayRead:  13986 ms   1.43048.400 
   846172
ReflectFloatArrayWrite:   2953 ms   6.771   229.186 
   846172
   ReflectNestedFloatArrayRead:  16618 ms   1.20340.733 
   846172
  ReflectNestedFloatArrayWrite:   4841 ms   4.131   139.820 
   846172
  ReflectNestedObjectArrayRead:  12905 ms   0.31039.989 
   645104
 ReflectNestedObjectArrayWrite:   6868 ms   0.58275.139 
   645104
  ReflectNestedLargeFloatArrayRead:  10141 ms   0.32985.781 
  1087381
 ReflectNestedLargeFloatArrayWrite:   2049 ms   1.626   424.432 
  1087381
   ReflectNestedLargeFloatArrayBlockedRead:  10501 ms   0.31783.899 
  1101357
  ReflectNestedLargeFloatArrayBlockedWrite:   5554 ms   0.600   158.634 
  1101357
{noformat}

Unsafe Off:
{noformat}
   test name timeM 
entries/sec   M bytes/sec  bytes/cycle
 ReflectRecordRead:  13282 ms   1.25548.694 
   808498
ReflectRecordWrite:   8981 ms   1.85672.011 
   808498
  ReflectBigRecordRead:  17118 ms   0.58435.863 
   767380
 ReflectBigRecordWrite:  13178 ms   0.75946.584 
   767380
  ReflectFloatRead:   6713 ms   0.000   119.160 
  104
 ReflectFloatWrite:   2444 ms   0.000   327.229 
  104
 ReflectDoubleRead:   8094 ms   0.000   197.677 
  204
ReflectDoubleWrite:   2133 ms   0.000   749.844 
  204
   ReflectIntArrayRead:  12127 ms   1.37456.712 
   859709
  ReflectIntArrayWrite:   3832 ms   4.349   179.463 
   859709
  ReflectLongArrayRead:   6312 ms   1.320   102.059 
   805344
 ReflectLongArrayWrite:   2548 ms   3.269   252.785 
   805344
ReflectDoubleArrayRead:   7460 ms   1.34087.726 
   818144
   ReflectDoubleArrayWrite:   2048 ms   4.882   319.526 
   818144
 ReflectFloatArrayRead:  11761 ms   1.70057.554 
   846172
ReflectFloatArrayWrite:   3370 ms   5.935   200.871 
   846172
   ReflectNestedFloatArrayRead:  15946 ms   1.25442.450 
   846172
  ReflectNestedFloatArrayWrite:   6429 ms   3.111   105.291 
   846172
  ReflectNestedObjectArrayRead:  17478 ms   0.22929.527 
   645104
 ReflectNestedObjectArrayWrite:  12148 ms   0.32942.480 
   645104
  ReflectNestedLargeFloatArrayRead:   9012 ms   0.37096.524 
  1087381
 

[jira] [Commented] (AVRO-1218) Avro 1.7.3 fails to build

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645753#comment-13645753
 ] 

Scott Carey commented on AVRO-1218:
---

Is this still an issue?  I can build on OSX if I update snappy-java  (see 
AVRO-1311).

 Avro 1.7.3 fails to build 
 --

 Key: AVRO-1218
 URL: https://issues.apache.org/jira/browse/AVRO-1218
 Project: Avro
  Issue Type: Bug
  Components: build
Affects Versions: 1.7.3
 Environment: OS X 10.8.2
Reporter: Russell Jurney
Priority: Blocker
  Labels: avro, build, for, pig, piggybank, wont
 Attachments: build.log


 I am trying to build Avro 1.7.3 from source as a workaround for issues in 
 PIG-3015. It does not build :(
 Errors attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1261) Honor schema defaults with the Constructor in addition to the builders.

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645789#comment-13645789
 ] 

Scott Carey commented on AVRO-1261:
---

+1

 Honor schema defaults with the Constructor in addition to the builders.
 ---

 Key: AVRO-1261
 URL: https://issues.apache.org/jira/browse/AVRO-1261
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
Reporter: Christopher Conner
Assignee: Doug Cutting
Priority: Minor
 Fix For: 1.7.5

 Attachments: AVRO-1261.patch


 As I understand it, currently if you want to utilize defaults in a schema, ie:
 { 
 namespace: com.chris.test, 
 type: record, 
 name: CHRISTEST, 
 doc: Chris Test, 
 fields: [ 
 {name: firstname, type: string, default: Chris}, 
 {name: lastname, type: string, default: Conner}, 
 {name: username, type: string, default: cconner}
 ] 
 }
 Then I have to use the builders to create my objects.  IE:
 public class ChrisAvroTest {
 public static void main(String[] args) throws Exception {
 CHRISTEST person = CHRISTEST.newBuilder() 
 .build(); 
 System.out.println(person: + person);
 } 
 }
 Is my understanding correct?  Is it possible to make it so the default 
 constructor as well?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645792#comment-13645792
 ] 

Scott Carey commented on AVRO-1245:
---

What about a more fluent API?

{code}
User.newBuilder(thirdPartyRecord).replaceNullsWithDefaults().replaceEmptyStringsWithDefaults();
{code}



 Add Merging Functionality to Generated Builders
 ---

 Key: AVRO-1245
 URL: https://issues.apache.org/jira/browse/AVRO-1245
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.3
 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3
Reporter: Sharmarke Aden
Priority: Minor

 Suppose I have a record with the following schema and default values: 
 {code}
 {
 type: record,
 namespace: test,
 name: User,
 fields: [
 {
 name: user,
 type: [null, string],
 default: null
 },
 {
 name: privacy,
 type: [
 {
 type: enum,
 name: Privacy,
 namespace: test,
 symbols: [Public, Private]
 },
 null
 ],
 default: Private
 }
 ]
 }
 {code}
 Now suppose I have a record supplied to me by a third party whose privacy 
 field value is null. Currently if you call 
 Builder.newBuilder(thirdPartyRecord) it simply creates a new record with 
 same values as the source record (privacy is null in the newly created 
 builder). 
 It's very important that the privacy value be set and so ideally I would like 
 to perform a merge to mitigate any issues with default values being absent in 
 the source record. I would like to propose that a new enhancement be added to 
 the Builder to support merging of a source record to a new record. Perhaps 
 something like this:
 {code}
 // recordWithoutDefaults record passed in.
 User.Builder builder = User.newBuilder();
 //ignore null values in the source record if the schema has a default 
 //value for the field
 boolean ignoreNull = true;
 //ignore empty string values in the source record for string field 
 //types with default field values
 boolean ignoreEmptyString = true;
 //while this is simple and useful in my use-case perhaps there's a
 //better/refined way of supporting veracious merging models
 builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645795#comment-13645795
 ] 

Scott Carey commented on AVRO-1245:
---

I suppose the merge idea would have better performance.

 Add Merging Functionality to Generated Builders
 ---

 Key: AVRO-1245
 URL: https://issues.apache.org/jira/browse/AVRO-1245
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.3
 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3
Reporter: Sharmarke Aden
Priority: Minor

 Suppose I have a record with the following schema and default values: 
 {code}
 {
 type: record,
 namespace: test,
 name: User,
 fields: [
 {
 name: user,
 type: [null, string],
 default: null
 },
 {
 name: privacy,
 type: [
 {
 type: enum,
 name: Privacy,
 namespace: test,
 symbols: [Public, Private]
 },
 null
 ],
 default: Private
 }
 ]
 }
 {code}
 Now suppose I have a record supplied to me by a third party whose privacy 
 field value is null. Currently if you call 
 Builder.newBuilder(thirdPartyRecord) it simply creates a new record with 
 same values as the source record (privacy is null in the newly created 
 builder). 
 It's very important that the privacy value be set and so ideally I would like 
 to perform a merge to mitigate any issues with default values being absent in 
 the source record. I would like to propose that a new enhancement be added to 
 the Builder to support merging of a source record to a new record. Perhaps 
 something like this:
 {code}
 // recordWithoutDefaults record passed in.
 User.Builder builder = User.newBuilder();
 //ignore null values in the source record if the schema has a default 
 //value for the field
 boolean ignoreNull = true;
 //ignore empty string values in the source record for string field 
 //types with default field values
 boolean ignoreEmptyString = true;
 //while this is simple and useful in my use-case perhaps there's a
 //better/refined way of supporting veracious merging models
 builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645811#comment-13645811
 ] 

Scott Carey commented on AVRO-607:
--

I quite like Guava, Having a concurrent weak hash map is great, and the 
Immutable collections are very useful, and several other collection types are 
massive time savers (Multiset, Multimap and BiMap).

However, items get deprecated and dissapear in 2 years in Guava, so we would 
have to avoid the newest APIs and quickly move off of deprecated ones to 
prevent users who also use it from coming into conflict.  It is manageable, but 
it is a dependency that is very likely to be used by our users, and if we are 
on version 11 while a user is on 13, we could be in a position where neither 
version works for both of us simultaneously.  I also worry about our place as a 
library far down the stack for some users.  

We could complicate our build to shade in only the classes we use under a 
different namespace to avoid such problems (this may be useful for other 
dependencies as well). 



 SpecificData.getSchema not thread-safe
 --

 Key: AVRO-607
 URL: https://issues.apache.org/jira/browse/AVRO-607
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.3
Reporter: Stephen Tu
Priority: Minor
 Attachments: AVRO-607.patch


 SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap 
 is not thread-safe, and the method itself is not synchronized. Seems like 
 this could lead to the data structure getting corrupted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645819#comment-13645819
 ] 

Scott Carey commented on AVRO-607:
--

Alternative to this patch, we could synchronize the method, or we can use a 
ThreadLocalWeakHashMapType, Schema.

A cache with a Type or Class key (that is not weak) that becomes static can 
lead to classloader leaks. 

In Avro code, a weak concurrent hash map is in high demand.

 SpecificData.getSchema not thread-safe
 --

 Key: AVRO-607
 URL: https://issues.apache.org/jira/browse/AVRO-607
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.3
Reporter: Stephen Tu
Priority: Minor
 Attachments: AVRO-607.patch


 SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap 
 is not thread-safe, and the method itself is not synchronized. Seems like 
 this could lead to the data structure getting corrupted. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1044) avro-maven-plugin requires dependency resolution which breaks multi-module projects

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645871#comment-13645871
 ] 

Scott Carey commented on AVRO-1044:
---

Is the problem you are having with the 'idl' mojo?  It is the only mojo that 
requires dependency resolution, as it declares  '@requiresDependencyResolution 
runtime' for some reason.

That would explain why you have issues and I do not.  I am not sure why this 
mojo requiers all runtime dependencies to be in scope.

 avro-maven-plugin requires dependency resolution which breaks multi-module 
 projects
 ---

 Key: AVRO-1044
 URL: https://issues.apache.org/jira/browse/AVRO-1044
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.2
Reporter: Arvind Prabhakar
Priority: Critical

 Use of avro-maven-plugin breaks multimodule projects since it forces the 
 dependency resolution of all of the dependencies, some of which may be from 
 within the reactor and not yet installed in the local cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1044) avro-maven-plugin requires dependency resolution which breaks multi-module projects

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645873#comment-13645873
 ] 

Scott Carey commented on AVRO-1044:
---

This is due to AVRO-971.

This feature should be optional since it breaks builds.

 avro-maven-plugin requires dependency resolution which breaks multi-module 
 projects
 ---

 Key: AVRO-1044
 URL: https://issues.apache.org/jira/browse/AVRO-1044
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.6.2
Reporter: Arvind Prabhakar
Priority: Critical

 Use of avro-maven-plugin breaks multimodule projects since it forces the 
 dependency resolution of all of the dependencies, some of which may be from 
 within the reactor and not yet installed in the local cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins

2013-04-30 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1314:
-

 Summary: Java: Add @threadSafe annotation to maven plugins
 Key: AVRO-1314
 URL: https://issues.apache.org/jira/browse/AVRO-1314
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey


Our plugins are thread-safe, mark them as much so that warnings will not be 
printed when running parallel maven builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1314:
--

Attachment: AVRO-1313.patch

Trivial patch that sets the Mojos to be tagged @threadSafe.

 Java: Add @threadSafe annotation to maven plugins
 -

 Key: AVRO-1314
 URL: https://issues.apache.org/jira/browse/AVRO-1314
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Attachments: AVRO-1313.patch


 Our plugins are thread-safe, mark them as much so that warnings will not be 
 printed when running parallel maven builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1314:
--

Fix Version/s: 1.7.5
   Status: Patch Available  (was: Open)

I'll commit this soon unless there are objections.

 Java: Add @threadSafe annotation to maven plugins
 -

 Key: AVRO-1314
 URL: https://issues.apache.org/jira/browse/AVRO-1314
 Project: Avro
  Issue Type: Bug
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch


 Our plugins are thread-safe, mark them as much so that warnings will not be 
 printed when running parallel maven builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Status: Patch Available  (was: Open)

Ready for review.

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Assignee: Scott Carey

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (AVRO-1315) Java: Schema Validation utilities

2013-04-30 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1315:
-

 Summary: Java: Schema Validation utilities
 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
those out of that ticket as a stand-alone item.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645908#comment-13645908
 ] 

Scott Carey commented on AVRO-1315:
---

This incorporates the following:

* An additional public method on Symbol.java to detect whether a Symbol tree 
has an error in it.  In the case of a Symbol returned by schema resolution, 
this indicates that the schemas are not compatible.
* An interface o.a.a.SchemaValidator that checks a schema against an Iterable 
of other schemas.  The notion of compatibility is left to the implementation.  
Schemas in the Iterable are returned from most recent to oldest, if 
chronological order is applicable.
* An interface o.a.a.SchemaValidationStrategy that validates one schema against 
another.  The notion of compatibility is left to the implementation.
* A concrete SchemaValidator -- ValidateAll.  This takes a 
SchemaValidationStrategy as a constructor parameter and when its validate() 
method is called, uses the strategy for each item in the Iterable, in order.
* A concrete SchemaValidator -- ValidateLatest.  This takes a 
SchemaValidationStrategy as a constructor parameter and when its validate() 
method is called, uses the strategy for only the first item in the Iterable.
* A SchemaValidationBuilder for constructing SchemaValidators, with private 
implementations of SchemaValidationStrategy that can be configured: 
** Validate that the schema can read all others.
** Validate that all others can read the schema.
** Validate that the schema and all others are mutually compatible (can read 
each other).
** Validate that the schema can read the latest.
** Validate that the latest can read the schema.
** Validate that the latest and the schema are mutually compatible.


A few questions for discussion:

I am tempted to hide the concrete implementations and not expose publicly.  In 
the forthcoming patch, all are hidden so that only the interfaces and builder 
are public and need to be supported public APIs.  Alternatively we can expose 
some of the SchemaValidator and SchemaValidationStrategy implementations as 
public.  I am tempted to start private and see how this implementation works 
out.



 Java: Schema Validation utilities
 -

 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
 those out of that ticket as a stand-alone item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646108#comment-13646108
 ] 

Scott Carey commented on AVRO-1315:
---

Another option is to only have SchemaValidationStrategy -- the pair-wise 
validation -- in org.apache.avro.  The Validation over a list of 'previous' 
schemas can stay in the schema-repo being developed in AVRO-1124.  This would 
reduce what code is in the core avro package but might be more universally 
usable.


 Java: Schema Validation utilities
 -

 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5


 As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
 those out of that ticket as a stand-alone item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1315) Java: Schema Validation utilities

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1315:
--

Attachment: AVRO-1315.patch

Patch implementing the design discussed above.

 Java: Schema Validation utilities
 -

 Key: AVRO-1315
 URL: https://issues.apache.org/jira/browse/AVRO-1315
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1315.patch


 As part of AVRO-1124 we needed Schema Validation utilities.  I have separated 
 those out of that ticket as a stand-alone item.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1313:
--

Attachment: AVRO-1313-v2.patch

Even better, the check at runtime when loading should check that it will work.

This changes the runtime test to have fields of all types and validates that it 
works with all of them before loading an implementation.

Both Unsafe and Reflect cases of these code paths are covered in the Unit tests.

 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646145#comment-13646145
 ] 

Scott Carey edited comment on AVRO-1313 at 4/30/13 11:21 PM:
-

Even better, the check at runtime when loading should check that it will work.

This patch (-v2) changes the runtime test to have fields of all types and 
validates that it works with all of them before loading an implementation.

Both Unsafe and Reflect cases of these code paths are covered in the Unit tests.

  was (Author: scott_carey):
Even better, the check at runtime when loading should check that it will 
work.

This changes the runtime test to have fields of all types and validates that it 
works with all of them before loading an implementation.

Both Unsafe and Reflect cases of these code paths are covered in the Unit tests.
  
 Java: Add system property for disabling sun.misc.Unsafe
 ---

 Key: AVRO-1313
 URL: https://issues.apache.org/jira/browse/AVRO-1313
 Project: Avro
  Issue Type: Improvement
Reporter: Scott Carey
Assignee: Scott Carey
 Fix For: 1.7.5

 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch


 We should be able to disable use of sun.misc.Unsafe.
 I propose that if the system property avro.disable.unsafe is non-null, we 
 use reflection rather than Unsafe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2013-04-30 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646203#comment-13646203
 ] 

Scott Carey commented on AVRO-1124:
---

* schema metadata:  There is one race condition to consider -- currently 
subject.register(foo) is idempotent and also never fails unless there is a 
schema validation failure. Two users simultaneously registering the same schema 
end up with the same schema/id pair -- both fail or both succeed and get the 
same result.  If we tag metadata along with it, then two concurrent 
registrations with the same schema but different metadata might occur.  The 
actions are still idempotent and the two users get the same result, but only 
one will have the metadata expected set.  I will still have register() never 
fail outside of validation, but the schema metadata is not guaranteed to be 
what the user requested when there is a race condition -- the same thing 
happens with subject creation now.  If metadata is immutable, it can be cached 
and part of the SchemaEntry.  If it is not, it will need to be uncached or have 
a TTL, the latter I would like to avoid due to complexity. 

* In a subject, schema/id pairs are only added.   The caching layer is free to 
assume that once an id/schema relation exists, it will forever, there is no 
propagation of updates.  This is the sane thing to do -- once a datum has been 
written with an id, the schema tied to that key should be kept forever.  If a 
schema could be removed, we would need to check the repository for every record 
or have a TTL in the cache.   It would be easier to support 'deactivating' a 
schema/id pair so that it is not returned when scanning all the active schemas 
in a subject, or with validation, but can still be found by looking it up.   
Can you describe the use case for deleting a schema?  Under what conditions 
would you want to do so?

* I have opened https://issues.apache.org/jira/browse/AVRO-1315 to cover the 
avro schema validation components that live outside of the repo projects.  
Please provide feedback, Thanks!

 RESTful service for holding schemas
 ---

 Key: AVRO-1124
 URL: https://issues.apache.org/jira/browse/AVRO-1124
 Project: Avro
  Issue Type: New Feature
Reporter: Jay Kreps
Assignee: Jay Kreps
 Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
 AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch


 Motivation: It is nice to be able to pass around data in serialized form but 
 still know the exact schema that was used to serialize it. The overhead of 
 storing the schema with each record is too high unless the individual records 
 are very large. There are workarounds for some common cases: in the case of 
 files a schema can be stored once with a file of many records amortizing the 
 per-record cost, and in the case of RPC the schema can be negotiated ahead of 
 time and used for many requests. For other uses, though it is nice to be able 
 to pass a reference to a given schema using a small id and allow this to be 
 looked up. Since only a small number of schemas are likely to be active for a 
 given data source, these can easily be cached, so the number of remote 
 lookups is very small (one per active schema version).
 Basically this would consist of two things:
 1. A simple REST service that stores and retrieves schemas
 2. Some helper java code for fetching and caching schemas for people using 
 the registry
 We have used something like this at LinkedIn for a few years now, and it 
 would be nice to standardize this facility to be able to build up common 
 tooling around it. This proposal will be based on what we have, but we can 
 change it as ideas come up.
 The facilities this provides are super simple, basically you can register a 
 schema which gives back a unique id for it or you can query for a schema. 
 There is almost no code, and nothing very complex. The contract is that 
 before emitting/storing a record you must first publish its schema to the 
 registry or know that it has already been published (by checking your cache 
 of published schemas). When reading you check your cache and if you don't 
 find the id/schema pair there you query the registry to look it up. I will 
 explain some of the nuances in more detail below. 
 An added benefit of such a repository is that it makes a few other things 
 possible:
 1. A graphical browser of the various data types that are currently used and 
 all their previous forms.
 2. Automatic enforcement of compatibility rules. Data is always compatible in 
 the sense that the reader will always deserialize it (since they are using 
 the same schema as the writer) but this does not mean it is compatible with 
 the expectations of the reader. For example if an int field is 

[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch

2013-04-29 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644651#comment-13644651
 ] 

Scott Carey commented on AVRO-1310:
---

I purged my repo and tried 'mvn install' from trunk, and that worked fine.

I had one modification for mac in the lang/java pom.xml due to snappy-java 
issues on mac (https://github.com/ptaoussanis/carmine/issues/5) to update to 
version 1.0.5-M4.

What three tests fail in step 1?  Test failures in avro compiler will prevent 
the remainder from building.

What happens if you skip tests:

'mvn clean install -DskipTests' from trunk?

 Avro Maven project can't be built from scratch
 --

 Key: AVRO-1310
 URL: https://issues.apache.org/jira/browse/AVRO-1310
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
 Environment: Maven on Eclipse
Reporter: Nir Zamir

 When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn 
 install') there are errors.
 Most of the errors are in tests so I tried skipping the tests but it still 
 fails.
 See more details in my post on Avro Users: 
 http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch

2013-04-29 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644661#comment-13644661
 ] 

Scott Carey commented on AVRO-1310:
---

The end result you are having:
Could not find artifact org.apache.avro:avro-ipc:jar:tests:1.7.5-SNAPSHOT

Is due to either:
 * not having the avro-ipc test jar deployed in your local repo after being 
built form an install that includes the test phase of avro-ipc.
 * not having it available in the maven reactor as part of the build.

What is the output of 'mvn --version' ?  Mine is:

{noformat}
Apache Maven 3.0.3 (r1075438; 2011-02-28 09:31:09-0800)
Maven home: /usr/share/maven
Java version: 1.7.0_13, vendor: Oracle Corporation
Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_13.jdk/Contents/Home/jre
Default locale: en_US, platform encoding: UTF-8
OS name: mac os x, version: 10.7.5, arch: x86_64, family: mac
{noformat}


 Avro Maven project can't be built from scratch
 --

 Key: AVRO-1310
 URL: https://issues.apache.org/jira/browse/AVRO-1310
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.4
 Environment: Maven on Eclipse
Reporter: Nir Zamir

 When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn 
 install') there are errors.
 Most of the errors are in tests so I tried skipping the tests but it still 
 fails.
 See more details in my post on Avro Users: 
 http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-29 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1282:
--

Attachment: AVRO-1282-s6.patch

This patch (AVRO-1282-s6.patch) includes the following:

* The Reflect API now uses sun.misc.Unsafe to do Field reflection, for 
significantly improved performance. (Leo)
* The Reflect API avoids boxing for reading and writing primitive arrays.  
(Leo, Scott)
* Resolution of whether it is safe to use Unsafe is done statically, only one 
implementation is loaded.  The Unsafe implementation is tested to ensure that 
all features function properly at load time (for example, to handle Android or 
other JVMs with partial Unsafe support).  A unit test is added that uses a 
classloader that fails to load Unsafe to test this. (Leo, Scott)
* EncoderFactory is fixed to properly configure the BlockingBinaryEncoder. 
(Scott)
* Reflection now supports reading the Blocked encoding into native arrays. 
(Scott)
* A dozen new tests added to Perf.java to cover the 
ReflectDatum{Reader,Writer}, including with the blocked binary encoding. (Leo, 
Scott)
* Additional unit tests in TestReflect to cover encoding and decoding primitive 
arrays, including with blocked binary encoding. (Scott)

Performance on Reflect Perf.java tests is approximately 2.5x to 29x faster, 
usually between 3x and 5x faster.

Before:
{noformat}
 test name timeM entries/sec   M 
bytes/sec  bytes/cycle
 ReflectRecordRead:  15067 ms   1.10642.927 
   808498
ReflectRecordWrite:  10903 ms   1.52959.319 
   808498
  ReflectBigRecordRead:  19285 ms   0.51931.832 
   767380
 ReflectBigRecordWrite:  13958 ms   0.71643.979 
   767380
  ReflectFloatRead:  29594 ms   0.00027.032 
  104
 ReflectFloatWrite:  33327 ms   0.00024.004 
  104
 ReflectDoubleRead:  27442 ms   0.00058.303 
  204
ReflectDoubleWrite:  31529 ms   0.00050.746 
  204
   ReflectIntArrayRead:  38088 ms   0.43818.057 
   859709
  ReflectIntArrayWrite:  23342 ms   0.71429.464 
   859709
  ReflectLongArrayRead:  18476 ms   0.45134.869 
   805344
 ReflectLongArrayWrite:  12715 ms   0.65550.667 
   805344
ReflectDoubleArrayRead:  19411 ms   0.51533.718 
   818144
   ReflectDoubleArrayWrite:  13825 ms   0.72347.340 
   818144
 ReflectFloatArrayRead:  39502 ms   0.50617.137 
   846172
ReflectFloatArrayWrite:  27492 ms   0.72724.623 
   846172
   ReflectNestedFloatArrayRead:  41225 ms   0.48516.420 
   846172
  ReflectNestedFloatArrayWrite:  30229 ms   0.66222.393 
   846172
  ReflectNestedObjectArrayRead:  31679 ms   0.12616.291 
   645104
 ReflectNestedObjectArrayWrite:  17206 ms   0.23229.994 
   645104
  ReflectNestedLargeFloatArrayRead:  33099 ms   0.10126.282 
  1087381
 ReflectNestedLargeFloatArrayWrite:  35159 ms   0.09524.742 
  1087381
   ReflectNestedLargeFloatArrayBlockedRead:  33326 ms   0.10026.302 
  1095674
  ReflectNestedLargeFloatArrayBlockedWrite:  36921 ms   0.09023.741 
  1095674
{noformat}

After:
{noformat}
 test name timeM entries/sec   M 
bytes/sec  bytes/cycle
 ReflectRecordRead:   6058 ms   2.751   106.754 
   808498
ReflectRecordWrite:   3750 ms   4.444   172.470 
   808498
  ReflectBigRecordRead:   6767 ms   1.47890.709 
   767380
 ReflectBigRecordWrite:   4433 ms   2.255   138.466 
   767380
  ReflectFloatRead:   6155 ms   0.000   129.970 
  104
 ReflectFloatWrite:   1083 ms   0.001   738.434 
  104
 ReflectDoubleRead:   6610 ms   0.000   242.028 
  204
ReflectDoubleWrite:   1968 ms   0.000   812.864 
  204
   ReflectIntArrayRead:   9462 ms   1.76172.683 
   859709
  ReflectIntArrayWrite:   2468 ms   6.751   278.584 
   859709
  ReflectLongArrayRead:   5556 ms   1.500   115.941 
   805344

[jira] [Created] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7

2013-04-29 Thread Scott Carey (JIRA)
Scott Carey created AVRO-1311:
-

 Summary: Upgrade Snappy-Java dependency to support building on Mac 
+ Java 7
 Key: AVRO-1311
 URL: https://issues.apache.org/jira/browse/AVRO-1311
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.7.4
Reporter: Scott Carey
Assignee: Scott Carey


snappy-java 1.0.4 does not work with Mac + Java 7.  1.0.5-M4 is on maven, but 
it does not appear that there will be a final release of that.  1.1.0 is at -M3 
status, and is being developed now.  

Both of these work locally for me, when the dust settles we need to pick one 
before the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-29 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645054#comment-13645054
 ] 

Scott Carey commented on AVRO-1282:
---

I think I simply missed that.  I changed it at some point in the process and 
did not revert.

I'll change it back to use INSTANCE with get().

There are a lot of static caches here, and the ones that have Class objects in 
them without weak references are prone to trigger classloader leaking.  We can 
fix that elsewhere.  

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor
 Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, 
 AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, 
 avro-1282-v1.patch, avro-1282-v2.patch, avro-1282-v3.patch, 
 avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch, 
 avro-1282-v7.patch, avro-1282-v8.patch, AVRO-1282-v9.patch, 
 TestUnsafeUtil.java


 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-29 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated AVRO-1282:
--

Attachment: AVRO-1282-s7.patch

Minor difference to the -s7 patch:  Returns INSTANCE to ReflectData, as 
AVRO-1283 is the ticket for such changes.

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor
 Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, 
 AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, 
 AVRO-1282-s7.patch, avro-1282-v1.patch, avro-1282-v2.patch, 
 avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, 
 avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, 
 AVRO-1282-v9.patch, TestUnsafeUtil.java


 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it

2013-04-28 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643923#comment-13643923
 ] 

Scott Carey commented on AVRO-1282:
---

Leo:  Using a variation of your GenericRecordAccessor was the plan, but after a 
little more work I'm not sure it will go much faster.

Regarding caching by schema -- it caches first by class, then for each class it 
has both a lookup of Accessors by field name, and one by schema.   The looup by 
schema returns a FieldAccessor[], the one by name returns the FieldAccessor for 
that specific named field.

All of the main remaining performance issues are now similar to Generic and 
Specific code:

We are traversing and inspecting objects and other data structures on the fly 
far too often -- e.g. Schema, Parser, and various instanceof checks, and most 
of it could be pre computed if the code was structured radically differently.  
That will have to wait for another time.  I've got some old prototypes around 
elsewhere for Generic/Specific, and after digging in this deep in Reflect for 
the first time I'm convinced they all share the same fundamental performance 
barriers now.   Of course there is room for some tweaks here and there, but for 
major wins we need to make bigger changes.

I am finalizing a patch that gets performance back up to the level you had it 
or better -- a little faster in most cases and a little slower in others.  I've 
also rearranged and streamlined Perf.java, incorporating your more recent 
versions and refactoring to share more code and make it simpler.

There is still one fundamental flaw:  reading blocked encoding is not supported 
(it would trigger an array bounds check exception).  I have isolated the code 
that loops and writes arrays, so we can add that more easily from here.

 Make use of the sun.misc.Unsafe class during serialization if a JDK supports 
 it
 ---

 Key: AVRO-1282
 URL: https://issues.apache.org/jira/browse/AVRO-1282
 Project: Avro
  Issue Type: Improvement
  Components: java
Affects Versions: 1.7.4
Reporter: Leo Romanoff
Priority: Minor
 Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, 
 AVRO-1282-s3.patch, avro-1282-v1.patch, avro-1282-v2.patch, 
 avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, 
 avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, 
 TestUnsafeUtil.java


 Unsafe can be used to significantly speed up serialization process, if a JDK 
 implementation supports java.misc.Unsafe properly. Most JDKs running on PCs 
 support it. Some platforms like Android lack a proper support for Unsafe yet.
 There are two possibilities to use Unsafe for serialization:
 1) Very quick access to the fields of objects. It is way faster than with the 
 reflection-based approach using Field.get/set
 2) Input and Output streams can be using Unsafe to perform very quick 
 input/output.
  
 3) More over, Unsafe makes it possible to serialize to/deserialize from 
 off-heap memory directly and very quickly, without any intermediate buffers 
 allocated on heap. There is virtually no overhead compared to the usual byte 
 arrays.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   >