[jira] [Created] (AVRO-2355) Add compressionLevel to ZStandard compression
Scott Carey created AVRO-2355: - Summary: Add compressionLevel to ZStandard compression Key: AVRO-2355 URL: https://issues.apache.org/jira/browse/AVRO-2355 Project: Apache Avro Issue Type: New Feature Components: java Reporter: Scott Carey Fix For: 1.9.0 ZStandard compression should not be released without support for compression level selection. Its biggest advantage is the massive range by which you can select the compression level, all while keeping decompression throughput very high. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-2273) Release 1.8.3
[ https://issues.apache.org/jira/browse/AVRO-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797606#comment-16797606 ] Scott Carey commented on AVRO-2273: --- Its impossible to use Flink (and Kafka Connect in some cases) w/ 1.8.2 and a SpecificRecord that has an Enum in it. I'm running a custom version as a result (with a few other cherry-picked things from the master branch and built for java 8, so it can't be an official release). > Release 1.8.3 > - > > Key: AVRO-2273 > URL: https://issues.apache.org/jira/browse/AVRO-2273 > Project: Apache Avro > Issue Type: Task > Components: release >Reporter: Thiruvalluvan M. G. >Priority: Major > Fix For: 1.8.3 > > > This ticket is for releasing Avro 1.8.3 and discussing any topics related to > it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AVRO-2162) Add Zstandard compression to avro file format
[ https://issues.apache.org/jira/browse/AVRO-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-2162: -- Description: I'd like to add Zstandard compression for Avro. At compression level 1 It is almost as fast as Snappy at compression, with compression ratios more like gzip. At higher levels of compression, it is more compact than gzip -9 with much lower CPU when compressing and roughly 3x faster decompression. Adding it to Java is fairly easy. We'll need to say something about it in the spec however, as an 'optinal' codec. was: I'd like to add Zstandard compression for Avro. It is almost as fast as Snappy at compression, with compression ratios more like gzip. At higher levels of compression, it is more compact than gzip -9 with much lower CPU when compressing and roughly 3x faster decompression. Adding it to Java is fairly easy. We'll need to say something about it in the spec however, as an 'optinal' codec. > Add Zstandard compression to avro file format > - > > Key: AVRO-2162 > URL: https://issues.apache.org/jira/browse/AVRO-2162 > Project: Avro > Issue Type: Improvement > Components: java >Reporter: Scott Carey >Priority: Major > > I'd like to add Zstandard compression for Avro. > At compression level 1 It is almost as fast as Snappy at compression, with > compression ratios more like gzip. At higher levels of compression, it is > more compact than gzip -9 with much lower CPU when compressing and roughly 3x > faster decompression. > > Adding it to Java is fairly easy. We'll need to say something about it in > the spec however, as an 'optinal' codec. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AVRO-2162) Add Zstandard compression to avro file format
Scott Carey created AVRO-2162: - Summary: Add Zstandard compression to avro file format Key: AVRO-2162 URL: https://issues.apache.org/jira/browse/AVRO-2162 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey I'd like to add Zstandard compression for Avro. It is almost as fast as Snappy at compression, with compression ratios more like gzip. At higher levels of compression, it is more compact than gzip -9 with much lower CPU when compressing and roughly 3x faster decompression. Adding it to Java is fairly easy. We'll need to say something about it in the spec however, as an 'optinal' codec. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13827415#comment-13827415 ] Scott Carey commented on AVRO-1124: --- All: I apologize for the long delay. What we have used in production for about a year is very close to what has been in this ticket the whole time. I have never considered it complete for a few reasons. I have been close to done with this for some time now but swamped by other responsibilities and what is currently in use has been good enough for now, but it won't be for long. The latest changes however, would significantly impact some of the API with respect to how the schema repo manages validation and compatibility. This would be significantly more flexible for interfacing with other systems. It boils down to the following observation: It appears that all notions of schema compatibility share a common form. The previously discussed forwards compatible or N + 1 compatibility are all flavors of the same set of constraints. In any set of schemas you wish to consider for compatibility (a Subject here), at any given time you have a subset of these schemas that you wish to be able to read with, a subset you must be able to read from. You may have some that you neither wish to read from or write to but must keep the mapping of the id. The way to represent this is to have a read state and a write state per schema in the subject. The read state has two possible values, naming help needed: reader, not_readable The write state has three possible values, naming help needed:, writer, written, not_writable The constraint of the system is that all reader schemas can read all writer and written schemas, per subject. A schema can transition either state, one at a time, leading to pair-wise testing of schema X can read Y: * A schema transition from not_readable to reader succeeds only if it can read all schemas that are currently writer and written. * A schema transition from reader to not_readable requires no pairwise schema validation, but some other pluggable validation may apply. * A schema transition from not_writable to writer or written requires pairwise validation that that schema can be ready by all current reader schemas. * A all other schema write state transitions do not require pair-wise schema validation, but other pluggable validation may apply. The write state has three possibilities because it is important to differentiate between the case where you allow new records of this type to be written (writer) from one where you wish no new records to be written, but the data store still has values with the schema present (written). Every compatibility scheme can fit in the above. Single-reader, multiple writer. Single writer, multiple reader.N+-1 compatibility. Full cross-compatibility. The above is significantly more flexible than the early proposals on this topic, but will require changes to the REST interface. Loading data from the old, into the new, will be fairly simple however -- some curl commands and bash scripts will do it. , and enforce the constraint that all schemas that have their can read with state set must be able to read all schemas that have their write state in can write with or must be able to read RESTful service for holding schemas --- Key: AVRO-1124 URL: https://issues.apache.org/jira/browse/AVRO-1124 Project: Avro Issue Type: New Feature Reporter: Jay Kreps Assignee: Jay Kreps Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, AVRO-1124-validators-preliminary.patch, AVRO-1124.patch, AVRO-1124.patch Motivation: It is nice to be able to pass around data in serialized form but still know the exact schema that was used to serialize it. The overhead of storing the schema with each record is too high unless the individual records are very large. There are workarounds for some common cases: in the case of files a schema can be stored once with a file of many records amortizing the per-record cost, and in the case of RPC the schema can be negotiated ahead of time and used for many requests. For other uses, though it is nice to be able to pass a reference to a given schema using a small id and allow this to be looked up. Since only a small number of schemas are likely to be active for a given data source, these can easily be cached, so the number of remote lookups is very small (one per active schema version). Basically this would consist of two things: 1. A simple REST service that stores and retrieves schemas 2. Some helper java code for fetching and caching schemas for people using the registry We have used something like this at LinkedIn for a few years now, and it would be nice to standardize this
[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+
[ https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789298#comment-13789298 ] Scott Carey commented on AVRO-1126: --- I would like to consider that in our API we don't use any JSON default values at all, but instead our own types for values. Defaults are currently a performance issue in the decoder because we read the JSON value for every record. The SchemaBuilder API does not expose the use of Jackson types in its API. Upgrade to Jackson 2+ - Key: AVRO-1126 URL: https://issues.apache.org/jira/browse/AVRO-1126 Project: Avro Issue Type: Task Components: java Reporter: James Tyrrell Priority: Critical Fix For: 1.8.0 Quite annoyingly with Jackson 2+ the base package name has changed from org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the dependencies from: {code:xml} dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-core-asl/artifactId version${jackson.version}/version /dependency dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version${jackson.version}/version /dependency {code} to: {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version${jackson.version}/version /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version${jackson.version}/version /dependency {code} the base package in the code needs to be updated. More info can be found [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do the work just let me know what is preferable i.e. should I just attach a patch to this issue? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+
[ https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789865#comment-13789865 ] Scott Carey commented on AVRO-1126: --- I propose that we have a canonical representation of schema default values that has no external surface area of a third party library, as part of the Schema API. Users should not be required to explicitly link to jackson to use our API, so that we can change implementation details such as which JSON library we use without breaking the API. We could choose the Generic representations for this, or make new ones. It is all easy until you get to arrays, maps, and records. Specific, reflect, generic, or future representations can all be different. A new and improved schema resolution system would convert the default value to the target type once at schema resolution time instead of every record read. This sort of re-use would require immutable data representation or copying. Upgrade to Jackson 2+ - Key: AVRO-1126 URL: https://issues.apache.org/jira/browse/AVRO-1126 Project: Avro Issue Type: Task Components: java Reporter: James Tyrrell Priority: Critical Fix For: 1.8.0 Quite annoyingly with Jackson 2+ the base package name has changed from org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the dependencies from: {code:xml} dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-core-asl/artifactId version${jackson.version}/version /dependency dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version${jackson.version}/version /dependency {code} to: {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version${jackson.version}/version /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version${jackson.version}/version /dependency {code} the base package in the code needs to be updated. More info can be found [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do the work just let me know what is preferable i.e. should I just attach a patch to this issue? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (AVRO-739) Add Date/Time data types
[ https://issues.apache.org/jira/browse/AVRO-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760779#comment-13760779 ] Scott Carey commented on AVRO-739: -- {quote} These seem like two different external representations of the same thing. A time plus a timezone can be losslessly converted to a UTC time. You do lose the original timezone, but dates and times are usually displayed in the timezone of the displayer, not where the time was originally noted. {quote} I completely agree for use cases where the time is being displayed to a user, but there are use cases where the loss of the original time zone is not acceptable. One could log another field with the timezone identifier for these.The use case for a UTC timestamp is more broadly applicable. I do not think we need to implement the one that also persists timezone now, but I do think we need to make sure that if we did implement such a thing in the future, that the names for these two things would be consistent. If we name this Datetime we are implying it has relation to dates, which implies relationship to timezones. With respect to the SQL variants, I see only two that represent a single point in time. Three are either dates or times and not the combination (e.g. January 7, 2100, representing a time with granularity of one day, or 5:01 -- a time of day, respectively). The two SQL equivalents are TIMESTAMP and TIMESTAMP WITH TIMEZONE. This proposal covers TIMESTAMP, roughly. I am suggesting we reserve space for a future TIMESTAMP WITH TIMEZONE. We could adopt the names for consistency. timestamp and timestamptz There is also the question of serialization in JSON form. A long in binary form makes sense, but in JSON, an ISO8601 string might be more useful. Add Date/Time data types Key: AVRO-739 URL: https://issues.apache.org/jira/browse/AVRO-739 Project: Avro Issue Type: New Feature Components: spec Reporter: Jeff Hammerbacher Attachments: AVRO-739.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742596#comment-13742596 ] Scott Carey commented on AVRO-1124: --- Yes. I have quite a bit of work outstanding on this to finish and submit for review. But I'll be on vacation for 2 weeks. Re: Incremental ids: The ids don't have to be incremental, that is an option that is up to the repository implementation. They can also be an arbitrary string. Your repositories probably will not map directly to environments, but to the data. If you are sharing data across environments, you will share repositories (or clone them) with the data. RESTful service for holding schemas --- Key: AVRO-1124 URL: https://issues.apache.org/jira/browse/AVRO-1124 Project: Avro Issue Type: New Feature Reporter: Jay Kreps Assignee: Jay Kreps Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch Motivation: It is nice to be able to pass around data in serialized form but still know the exact schema that was used to serialize it. The overhead of storing the schema with each record is too high unless the individual records are very large. There are workarounds for some common cases: in the case of files a schema can be stored once with a file of many records amortizing the per-record cost, and in the case of RPC the schema can be negotiated ahead of time and used for many requests. For other uses, though it is nice to be able to pass a reference to a given schema using a small id and allow this to be looked up. Since only a small number of schemas are likely to be active for a given data source, these can easily be cached, so the number of remote lookups is very small (one per active schema version). Basically this would consist of two things: 1. A simple REST service that stores and retrieves schemas 2. Some helper java code for fetching and caching schemas for people using the registry We have used something like this at LinkedIn for a few years now, and it would be nice to standardize this facility to be able to build up common tooling around it. This proposal will be based on what we have, but we can change it as ideas come up. The facilities this provides are super simple, basically you can register a schema which gives back a unique id for it or you can query for a schema. There is almost no code, and nothing very complex. The contract is that before emitting/storing a record you must first publish its schema to the registry or know that it has already been published (by checking your cache of published schemas). When reading you check your cache and if you don't find the id/schema pair there you query the registry to look it up. I will explain some of the nuances in more detail below. An added benefit of such a repository is that it makes a few other things possible: 1. A graphical browser of the various data types that are currently used and all their previous forms. 2. Automatic enforcement of compatibility rules. Data is always compatible in the sense that the reader will always deserialize it (since they are using the same schema as the writer) but this does not mean it is compatible with the expectations of the reader. For example if an int field is changed to a string that will almost certainly break anyone relying on that field. This definition of compatibility can differ for different use cases and should likely be pluggable. Here is a description of one of our uses of this facility at LinkedIn. We use this to retain a schema with log data end-to-end from the producing app to various real-time consumers as well as a set of resulting AvroFile in Hadoop. This schema metadata can then be used to auto-create hive tables (or add new fields to existing tables), or inferring pig fields, all without manual intervention. One important definition of compatibility that is nice to enforce is compatibility with historical data for a given table. Log data is usually loaded in an append-only manner, so if someone changes an int field in a particular data set to be a string, tools like pig or hive that expect static columns will be unusable. Even using plain-vanilla map/reduce processing data where columns and types change willy nilly is painful. However the person emitting this kind of data may not know all the details of compatible schema evolution. We use the schema repository to validate that any change made to a schema don't violate the compatibility model, and reject the update if it does. We do this check both at run time, and also as part of the ant task that generates specific record code (as an early warning). Some
[jira] [Commented] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742628#comment-13742628 ] Scott Carey commented on AVRO-1124: --- Its pluggable, so there are many options. We also have staging/prod/qa/dev environments. There is a repo for each, but when qa gets its data snapshot from prod, we also clone the repo. For dev/staging, we have a kafka mirror that is exactly the production data. Both of these environments access the prod repo read-only. In fact, even in production, most subjects are read-only to all applications. Operations has to add a new schemas for a release. This is akin to operations executing SQL scripts to do DDL prior to a code push. Having applicaitons 'automagically' update sql schemas or push avro schemas can lead to accidents, unless the security model is implemented properly. RESTful service for holding schemas --- Key: AVRO-1124 URL: https://issues.apache.org/jira/browse/AVRO-1124 Project: Avro Issue Type: New Feature Reporter: Jay Kreps Assignee: Jay Kreps Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch Motivation: It is nice to be able to pass around data in serialized form but still know the exact schema that was used to serialize it. The overhead of storing the schema with each record is too high unless the individual records are very large. There are workarounds for some common cases: in the case of files a schema can be stored once with a file of many records amortizing the per-record cost, and in the case of RPC the schema can be negotiated ahead of time and used for many requests. For other uses, though it is nice to be able to pass a reference to a given schema using a small id and allow this to be looked up. Since only a small number of schemas are likely to be active for a given data source, these can easily be cached, so the number of remote lookups is very small (one per active schema version). Basically this would consist of two things: 1. A simple REST service that stores and retrieves schemas 2. Some helper java code for fetching and caching schemas for people using the registry We have used something like this at LinkedIn for a few years now, and it would be nice to standardize this facility to be able to build up common tooling around it. This proposal will be based on what we have, but we can change it as ideas come up. The facilities this provides are super simple, basically you can register a schema which gives back a unique id for it or you can query for a schema. There is almost no code, and nothing very complex. The contract is that before emitting/storing a record you must first publish its schema to the registry or know that it has already been published (by checking your cache of published schemas). When reading you check your cache and if you don't find the id/schema pair there you query the registry to look it up. I will explain some of the nuances in more detail below. An added benefit of such a repository is that it makes a few other things possible: 1. A graphical browser of the various data types that are currently used and all their previous forms. 2. Automatic enforcement of compatibility rules. Data is always compatible in the sense that the reader will always deserialize it (since they are using the same schema as the writer) but this does not mean it is compatible with the expectations of the reader. For example if an int field is changed to a string that will almost certainly break anyone relying on that field. This definition of compatibility can differ for different use cases and should likely be pluggable. Here is a description of one of our uses of this facility at LinkedIn. We use this to retain a schema with log data end-to-end from the producing app to various real-time consumers as well as a set of resulting AvroFile in Hadoop. This schema metadata can then be used to auto-create hive tables (or add new fields to existing tables), or inferring pig fields, all without manual intervention. One important definition of compatibility that is nice to enforce is compatibility with historical data for a given table. Log data is usually loaded in an append-only manner, so if someone changes an int field in a particular data set to be a string, tools like pig or hive that expect static columns will be unusable. Even using plain-vanilla map/reduce processing data where columns and types change willy nilly is painful. However the person emitting this kind of data may not know all the details of compatible schema evolution. We use the schema repository to validate that any change made to
[jira] [Comment Edited] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742628#comment-13742628 ] Scott Carey edited comment on AVRO-1124 at 8/16/13 9:43 PM: Schema id generation pluggable, so there are many options. The _only_ requirement is that within a subject ids are unique and correspond to unique schemas. We also have staging/prod/qa/dev environments. There is a repo for each, but when qa gets its data snapshot from prod, we also clone the repo. For dev/staging, we have a kafka mirror that is exactly the production data. Both of these environments access the prod repo read-only. In fact, even in production, most subjects are read-only to all applications. Operations has to add a new schemas for a release. This is akin to operations executing SQL scripts to do DDL prior to a code push. Having applicaitons 'automagically' update sql schemas or push avro schemas can lead to accidents, unless the security model is implemented properly. was (Author: scott_carey): Its pluggable, so there are many options. We also have staging/prod/qa/dev environments. There is a repo for each, but when qa gets its data snapshot from prod, we also clone the repo. For dev/staging, we have a kafka mirror that is exactly the production data. Both of these environments access the prod repo read-only. In fact, even in production, most subjects are read-only to all applications. Operations has to add a new schemas for a release. This is akin to operations executing SQL scripts to do DDL prior to a code push. Having applicaitons 'automagically' update sql schemas or push avro schemas can lead to accidents, unless the security model is implemented properly. RESTful service for holding schemas --- Key: AVRO-1124 URL: https://issues.apache.org/jira/browse/AVRO-1124 Project: Avro Issue Type: New Feature Reporter: Jay Kreps Assignee: Jay Kreps Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch Motivation: It is nice to be able to pass around data in serialized form but still know the exact schema that was used to serialize it. The overhead of storing the schema with each record is too high unless the individual records are very large. There are workarounds for some common cases: in the case of files a schema can be stored once with a file of many records amortizing the per-record cost, and in the case of RPC the schema can be negotiated ahead of time and used for many requests. For other uses, though it is nice to be able to pass a reference to a given schema using a small id and allow this to be looked up. Since only a small number of schemas are likely to be active for a given data source, these can easily be cached, so the number of remote lookups is very small (one per active schema version). Basically this would consist of two things: 1. A simple REST service that stores and retrieves schemas 2. Some helper java code for fetching and caching schemas for people using the registry We have used something like this at LinkedIn for a few years now, and it would be nice to standardize this facility to be able to build up common tooling around it. This proposal will be based on what we have, but we can change it as ideas come up. The facilities this provides are super simple, basically you can register a schema which gives back a unique id for it or you can query for a schema. There is almost no code, and nothing very complex. The contract is that before emitting/storing a record you must first publish its schema to the registry or know that it has already been published (by checking your cache of published schemas). When reading you check your cache and if you don't find the id/schema pair there you query the registry to look it up. I will explain some of the nuances in more detail below. An added benefit of such a repository is that it makes a few other things possible: 1. A graphical browser of the various data types that are currently used and all their previous forms. 2. Automatic enforcement of compatibility rules. Data is always compatible in the sense that the reader will always deserialize it (since they are using the same schema as the writer) but this does not mean it is compatible with the expectations of the reader. For example if an int field is changed to a string that will almost certainly break anyone relying on that field. This definition of compatibility can differ for different use cases and should likely be pluggable. Here is a description of one of our uses of this facility at LinkedIn. We use this to retain a
[jira] [Commented] (AVRO-1126) Upgrade to Jackson 2+
[ https://issues.apache.org/jira/browse/AVRO-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737925#comment-13737925 ] Scott Carey commented on AVRO-1126: --- We should definitely clean up the exposure of Jackson in our API. I propose deprecating the use of it in the 1.8.x in favor of a replacement and remove it in 1.9.x. Upgrading to 2.x from 1.x is a different issue. I fail to see how upgrading to 2.2 is urgent at all. What new features do you propose that Avro needs to use internally? If Avro no longer exposes Jackson in its API, it is a purely internal matter to Avro and does not affect users who might want to use Jackson 2.x themselves, since Jackson 1.x and 2.x live in non-conflicting namespaces both in maven and java package names. Upgrade to Jackson 2+ - Key: AVRO-1126 URL: https://issues.apache.org/jira/browse/AVRO-1126 Project: Avro Issue Type: Task Components: java Reporter: James Tyrrell Priority: Critical Fix For: 1.8.0 Quite annoyingly with Jackson 2+ the base package name has changed from org.codehaus.jackson to com.fasterxml.jackson so in addition to changing the dependencies from: {code:xml} dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-core-asl/artifactId version${jackson.version}/version /dependency dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version${jackson.version}/version /dependency {code} to: {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version${jackson.version}/version /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version${jackson.version}/version /dependency {code} the base package in the code needs to be updated. More info can be found [here|http://wiki.fasterxml.com/JacksonUpgradeFrom19To20], I am happy to do the work just let me know what is preferable i.e. should I just attach a patch to this issue? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1348) Improve Utf8 to String conversion
[ https://issues.apache.org/jira/browse/AVRO-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739076#comment-13739076 ] Scott Carey commented on AVRO-1348: --- About a year ago I experimented with all sorts of UTF8 to string optimizations, using state machines and other techniques in addition to those similar to this patch and only ever got minor (5%) improvements. It was hard to beat 'new String(bytes, 0, length, UTF8)' safely. A fully custom state machine utf8 decoder was almost 10% faster. Improve Utf8 to String conversion - Key: AVRO-1348 URL: https://issues.apache.org/jira/browse/AVRO-1348 Project: Avro Issue Type: Bug Reporter: Mark Wagner Assignee: Mohammad Kamrul Islam Attachments: AVRO1348v1.patch AVRO-1241 found that the existing method of creating Strings from Utf8 byte arrays could be made faster. The same method is being used in the Utf8.toString(), and could likely be sped up by doing the same thing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.
[ https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1144: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed in revision 1508713. Deadlock with FSInput and Hadoop NativeS3FileSystem. Key: AVRO-1144 URL: https://issues.apache.org/jira/browse/AVRO-1144 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.0 Environment: Hadoop 1.0.3 Reporter: Shawn Smith Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1144.patch Deadlock can occur when using org.apache.avro.mapred.FsInput to read files from S3 using the Hadoop NativeS3FileSystem and multiple threads. There are a lot of components involved, but the basic cause is pretty simple: Apache Commons HttpClient can deadlock waiting for a free HTTP connection when the number of threads downloading from S3 is greater than or equal to the maximum allowed HTTP connections per host. I've filed this bug against Avro because the bug is easiest to fix in Avro. Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls in the FSInput constructor: {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.stream = path.getFileSystem(conf).open(path); this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); } {noformat} to {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); this.stream = path.getFileSystem(conf).open(path); } {noformat} Here's what triggers the deadlock: * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and open an HTTP connection for downloading content. This acquires an HTTP connection but does not release it. * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to S3 and perform a HEAD request to get object metadata. This attempts to acquire a second HTTP connection. * Jets3t uses Apache Commons HTTP Client which limits the number of simultaneous HTTP connections to a given host. Lets say this maximum is 4 (the default)... If 4 threads all call the FSInput constructor concurrently, the 4 FileSystem.open() calls can acquire all 4 available connections and the FileSystem.getFileStatus() calls block forever waiting for a thread to release an HTTP connection back to the connection pool. A simple way to reproduce the problem this problem is to create jets3t.properties in your classpath with httpclient.max-connections=1. Then try to open a file using FSInput and the Native S3 file system (new Path(s3n://bucket/path)). It will hang indefinitely inside the FSInput constructor. Swapping the order of the open() and getFileStatus() calls ensures that a given thread using FSInput has at most one outstanding connection S3 at a time. As a result, one thread should always be able to make progress, avoiding deadlock. Here's a sample stack trace of a deadlocked thread: {noformat} pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() [116a02000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492) at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793) at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225) at
[jira] [Commented] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.
[ https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722677#comment-13722677 ] Scott Carey commented on AVRO-1144: --- Looks reasonable to me. With the change, all tests pass. I will commit this tomorrow if there are no objections, and provide the trivial patch now. Deadlock with FSInput and Hadoop NativeS3FileSystem. Key: AVRO-1144 URL: https://issues.apache.org/jira/browse/AVRO-1144 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.0 Environment: Hadoop 1.0.3 Reporter: Shawn Smith Attachments: AVRO-1144.patch Deadlock can occur when using org.apache.avro.mapred.FsInput to read files from S3 using the Hadoop NativeS3FileSystem and multiple threads. There are a lot of components involved, but the basic cause is pretty simple: Apache Commons HttpClient can deadlock waiting for a free HTTP connection when the number of threads downloading from S3 is greater than or equal to the maximum allowed HTTP connections per host. I've filed this bug against Avro because the bug is easiest to fix in Avro. Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls in the FSInput constructor: {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.stream = path.getFileSystem(conf).open(path); this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); } {noformat} to {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); this.stream = path.getFileSystem(conf).open(path); } {noformat} Here's what triggers the deadlock: * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and open an HTTP connection for downloading content. This acquires an HTTP connection but does not release it. * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to S3 and perform a HEAD request to get object metadata. This attempts to acquire a second HTTP connection. * Jets3t uses Apache Commons HTTP Client which limits the number of simultaneous HTTP connections to a given host. Lets say this maximum is 4 (the default)... If 4 threads all call the FSInput constructor concurrently, the 4 FileSystem.open() calls can acquire all 4 available connections and the FileSystem.getFileStatus() calls block forever waiting for a thread to release an HTTP connection back to the connection pool. A simple way to reproduce the problem this problem is to create jets3t.properties in your classpath with httpclient.max-connections=1. Then try to open a file using FSInput and the Native S3 file system (new Path(s3n://bucket/path)). It will hang indefinitely inside the FSInput constructor. Swapping the order of the open() and getFileStatus() calls ensures that a given thread using FSInput has at most one outstanding connection S3 at a time. As a result, one thread should always be able to make progress, avoiding deadlock. Here's a sample stack trace of a deadlocked thread: {noformat} pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() [116a02000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492) at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793) at
[jira] [Assigned] (AVRO-1144) Deadlock with FSInput and Hadoop NativeS3FileSystem.
[ https://issues.apache.org/jira/browse/AVRO-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey reassigned AVRO-1144: - Assignee: Scott Carey Deadlock with FSInput and Hadoop NativeS3FileSystem. Key: AVRO-1144 URL: https://issues.apache.org/jira/browse/AVRO-1144 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.0 Environment: Hadoop 1.0.3 Reporter: Shawn Smith Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1144.patch Deadlock can occur when using org.apache.avro.mapred.FsInput to read files from S3 using the Hadoop NativeS3FileSystem and multiple threads. There are a lot of components involved, but the basic cause is pretty simple: Apache Commons HttpClient can deadlock waiting for a free HTTP connection when the number of threads downloading from S3 is greater than or equal to the maximum allowed HTTP connections per host. I've filed this bug against Avro because the bug is easiest to fix in Avro. Swap the order of the FileSystem.open() and FileSystem.getFileStatus() calls in the FSInput constructor: {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.stream = path.getFileSystem(conf).open(path); this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); } {noformat} to {noformat} /** Construct given a path and a configuration. */ public FsInput(Path path, Configuration conf) throws IOException { this.len = path.getFileSystem(conf).getFileStatus(path).getLen(); this.stream = path.getFileSystem(conf).open(path); } {noformat} Here's what triggers the deadlock: * FSInput calls FileSystem.open() which calls Jets3t to connect to S3 and open an HTTP connection for downloading content. This acquires an HTTP connection but does not release it. * FSInput calls FileSystem.getFileStatus() which calls Jets3t to connect to S3 and perform a HEAD request to get object metadata. This attempts to acquire a second HTTP connection. * Jets3t uses Apache Commons HTTP Client which limits the number of simultaneous HTTP connections to a given host. Lets say this maximum is 4 (the default)... If 4 threads all call the FSInput constructor concurrently, the 4 FileSystem.open() calls can acquire all 4 available connections and the FileSystem.getFileStatus() calls block forever waiting for a thread to release an HTTP connection back to the connection pool. A simple way to reproduce the problem this problem is to create jets3t.properties in your classpath with httpclient.max-connections=1. Then try to open a file using FSInput and the Native S3 file system (new Path(s3n://bucket/path)). It will hang indefinitely inside the FSInput constructor. Swapping the order of the open() and getFileStatus() calls ensures that a given thread using FSInput has at most one outstanding connection S3 at a time. As a result, one thread should always be able to make progress, avoiding deadlock. Here's a sample stack trace of a deadlocked thread: {noformat} pool-10-thread-3 prio=5 tid=11026f800 nid=0x116a04000 in Object.wait() [116a02000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518) - locked 785892cc0 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:357) at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestHead(RestS3Service.java:652) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1556) at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectDetailsImpl(RestS3Service.java:1492) at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1793) at org.jets3t.service.S3Service.getObjectDetails(S3Service.java:1225) at
[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1325: -- Resolution: Fixed Status: Resolved (was: Patch Available) Additionally required upgrading Jackson to 1.9.13 and some pom.xml changes to make builds work on mac. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, AVRO-1325-v4.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722062#comment-13722062 ] Scott Carey commented on AVRO-1325: --- committed in revision 1507862. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, AVRO-1325-v4.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710349#comment-13710349 ] Scott Carey commented on AVRO-1325: --- I will commit this if there are no objections by this time tomorrow. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch, AVRO-1325-v3.patch, AVRO-1325-v4.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1349) Update site Javadoc to remove vulnerability
Scott Carey created AVRO-1349: - Summary: Update site Javadoc to remove vulnerability Key: AVRO-1349 URL: https://issues.apache.org/jira/browse/AVRO-1349 Project: Avro Issue Type: Bug Reporter: Scott Carey Priority: Critical see http://www.kb.cert.org/vuls/id/225657 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (AVRO-1349) Update site Javadoc to remove vulnerability
[ https://issues.apache.org/jira/browse/AVRO-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey resolved AVRO-1349. --- Resolution: Fixed committed in revision 1495093. Every instance of site/publish/docs/AVRO_VERSION/api/java/index.html was modified. Update site Javadoc to remove vulnerability --- Key: AVRO-1349 URL: https://issues.apache.org/jira/browse/AVRO-1349 Project: Avro Issue Type: Bug Reporter: Scott Carey Priority: Critical see http://www.kb.cert.org/vuls/id/225657 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1213) Dependency on Jetty Servlet API in IPC
[ https://issues.apache.org/jira/browse/AVRO-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13678919#comment-13678919 ] Scott Carey commented on AVRO-1213: --- Netty also now has HTTP support, so we may be able to consolidate significantly and use it for both. Dependency on Jetty Servlet API in IPC -- Key: AVRO-1213 URL: https://issues.apache.org/jira/browse/AVRO-1213 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.2 Reporter: Sharmarke Aden Priority: Minor The compile scoped dependency on jetty servlet-api in the IPC pom file can be problematic if using Avro in a webapp environment. Would it be possible to make this dependency either optional or provided? Or maybe Avro modularize into sub-modules in such a way that desired features can be assembled piecemeal? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1335) ResolvingDecoder should provide bidirectional compatibility between different version of schemas
[ https://issues.apache.org/jira/browse/AVRO-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13663119#comment-13663119 ] Scott Carey commented on AVRO-1335: --- Thanks for the clarification. Is it safe to summarize the issue as C++ should support field default values? Or are there other things that are also preventing bidirectional schema evolution use cases besides default values? I cannot provide a time-frame for this, the volunteers who build and maintain the C++ Avro code may have more information. ResolvingDecoder should provide bidirectional compatibility between different version of schemas Key: AVRO-1335 URL: https://issues.apache.org/jira/browse/AVRO-1335 Project: Avro Issue Type: Improvement Components: c++ Affects Versions: 1.7.4 Reporter: Bin Guo We found that resolvingDecoder could not provide bidirectional compatibility between different version of schemas. Especially for records, for example: {code:title=First schema} { type: record, name: TestRecord, fields: [ { name: MyData, type: { type: record, name: SubData, fields: [ { name: Version1, type: string } ] } }, { name: OtherData, type: string } ] } {code} {code:title=Second schema} { type: record, name: TestRecord, fields: [ { name: MyData, type: { type: record, name: SubData, fields: [ { name: Version1, type: string }, { name: Version2, type: string } ] } }, { name: OtherData, type: string } ] } {code} Say, node A knows only the first schema and node B knows the second schema, and the second schema has more fields. Any data generated by node B can be resolved by first schema 'cause the additional field is marked as skipped. But data generated by node A can not be resolved by second schema and throws an exception *Don't know how to handle excess fields for reader.* This is because data is resolved exactly according to the auto-generated codec_traits which trying to read the excess field. The problem is we just can not only ignore the excess field in record, since the data after the troublesome record also needs to be resolved. Actually this problem stucked us for a very long time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7
[ https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1311: -- Attachment: AVRO-1311.patch snappy-java 1.0.5 is out, with fixes to Java 7 on mac. http://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.0.5/ The attached patch ups the version from 1.0.4 to 1.0.5. I will commit this soon if there are no objections. Upgrade Snappy-Java dependency to support building on Mac + Java 7 -- Key: AVRO-1311 URL: https://issues.apache.org/jira/browse/AVRO-1311 Project: Avro Issue Type: Bug Affects Versions: 1.7.4 Reporter: Scott Carey Assignee: Scott Carey Attachments: AVRO-1311.patch snappy-java 1.0.4 does not work with Mac + Java 7. 1.0.5-M4 is on maven, but it does not appear that there will be a final release of that. 1.1.0 is at -M3 status, and is being developed now. Both of these work locally for me, when the dust settles we need to pick one before the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1334) Java: update dependencies for 1.7.5
Scott Carey created AVRO-1334: - Summary: Java: update dependencies for 1.7.5 Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Description: A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat Consider upgrades for these as well as the Apache parent and build plugins. was: A report for mvn versions:display-property-updates on trunk -- [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 Consider upgrades for these as well as the Apache parent and build plugins. Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Description: A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. was: A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat Consider upgrades for these as well as the Apache parent and build plugins. Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version
[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662552#comment-13662552 ] Scott Carey commented on AVRO-1334: --- Jackson requires an upgrade because there is currently a bug in Avro due to it (I ran into it in AVRO-1325, unit tests there fail without Jackson 1.9.12. Hadoop1: I am not sure what the best version here is -- 0.20.205 feels a bit old. Suggestions? Jopt-simple is a low risk update. Paranamer is a low risk update (only a handful of bugfixes). sfl4j looks safe to update (performance improvements, bug fixes, and now compiled against Java 1.5 target). Junit is safe to update. netty -- netty 3.6.6.GA should be compatible (see http://netty.io/news/index.html) and has many fixes / enhancements. (an aside, netty now supports HTTP, so perhaps we can drop the ancient Jetty version we use and rely on netty for both raw and http to simplify things later?) The remainder are plugin updates, which are generally safe since testing them is easy to cover. I'll submit a patch with the updates shortly. Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Priority: Minor (was: Major) Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Priority: Minor (was: Major) Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1335) ResolvingDecoder should provide bidirectional compatibility between different version of schemas
[ https://issues.apache.org/jira/browse/AVRO-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662642#comment-13662642 ] Scott Carey commented on AVRO-1335: --- The second record must specify a default value for the added field: {code} { name: Version2, type: string, default: } {code} Otherwise, when reading with the second schema, and there is no data for field Version2 in the data written with the first schema, what do you want it to do? The Avro specification uses default values to handle the use case where a field is present for the reader but not the writer. ResolvingDecoder should provide bidirectional compatibility between different version of schemas Key: AVRO-1335 URL: https://issues.apache.org/jira/browse/AVRO-1335 Project: Avro Issue Type: Improvement Components: c++ Affects Versions: 1.7.4 Reporter: Bin Guo We found that resolvingDecoder could not provide bidirectional compatibility between different version of schemas. Especially for records, for example: {code:title=First schema} { type: record, name: TestRecord, fields: [ { name: MyData, type: { type: record, name: SubData, fields: [ { name: Version1, type: string } ] } }, { name: OtherData, type: string } ] } {code} {code:title=Second schema} { type: record, name: TestRecord, fields: [ { name: MyData, type: { type: record, name: SubData, fields: [ { name: Version1, type: string }, { name: Version2, type: string } ] } }, { name: OtherData, type: string } ] } {code} Say, node A knows only the first schema and node B knows the second schema, and the second schema has more fields. Any data generated by node B can be resolved by first schema 'cause the additional field is marked as skipped. But data generated by node A can not be resolved by second schema and throws an exception *Don't know how to handle excess fields for reader.* This is because data is resolved exactly according to the auto-generated codec_traits which trying to read the excess field. The problem is we just can not only ignore the excess field in record, since the data after the troublesome record also needs to be resolved. Actually this problem stucked us for a very long time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1311) Java: Upgrade snappy-java dependency to 1.0.5
[ https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1311: -- Summary: Java: Upgrade snappy-java dependency to 1.0.5 (was: Upgrade Snappy-Java dependency to support building on Mac + Java 7) Java: Upgrade snappy-java dependency to 1.0.5 - Key: AVRO-1311 URL: https://issues.apache.org/jira/browse/AVRO-1311 Project: Avro Issue Type: Bug Affects Versions: 1.7.4 Reporter: Scott Carey Assignee: Scott Carey Attachments: AVRO-1311.patch snappy-java 1.0.4 does not work with Mac + Java 7. 1.0.5-M4 is on maven, but it does not appear that there will be a final release of that. 1.1.0 is at -M3 status, and is being developed now. Both of these work locally for me, when the dust settles we need to pick one before the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (AVRO-1311) Java: Upgrade snappy-java dependency to 1.0.5
[ https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey resolved AVRO-1311. --- Resolution: Fixed Fix Version/s: 1.7.5 committed @ r1484656 Java: Upgrade snappy-java dependency to 1.0.5 - Key: AVRO-1311 URL: https://issues.apache.org/jira/browse/AVRO-1311 Project: Avro Issue Type: Bug Affects Versions: 1.7.4 Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1311.patch snappy-java 1.0.4 does not work with Mac + Java 7. 1.0.5-M4 is on maven, but it does not appear that there will be a final release of that. 1.1.0 is at -M3 status, and is being developed now. Both of these work locally for me, when the dust settles we need to pick one before the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Attachment: AVRO-1334.patch This patch updates versions of plugins and many dependencies. Of note: Netty version 3.6.6 was causing a deadlock in unit tests every time for me in TestNettyServerWithCallbacks (mac, java7). All versions of 3.4.x and 3.5.x including the current version hang about 15% of the time in TestNettyTransceiverWhenServerStops. I upgraded to the latest in the 3.5.x series. I cleaned up some version consistency in a few places, and the JUnit/Hamcrest relationship has changed a little. The newer maven plugin versions triggered deprecations in the maven plugins, so I updated those trivially. Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1334.patch A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1334: -- Status: Patch Available (was: Open) Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1334.patch A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662664#comment-13662664 ] Scott Carey commented on AVRO-1334: --- And lastly, the TestIDL had to be changed since the Jackson upgrade changed the whitespace in pretty print slightly (an empty array is now [] instead of [\n], so I made the test insensitive to whitespace. Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1334.patch A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1334) Java: update dependencies for 1.7.5
[ https://issues.apache.org/jira/browse/AVRO-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13662670#comment-13662670 ] Scott Carey commented on AVRO-1334: --- Dependencies I did not update that I defer to the expertise of others (a.k.a. I have no idea what the best thing to do is). hadoop1 hadoop2 thrift protobuf Java: update dependencies for 1.7.5 --- Key: AVRO-1334 URL: https://issues.apache.org/jira/browse/AVRO-1334 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1334.patch A report for mvn versions:display-property-updates on trunk -- {noformat} [INFO] The following version properties are referencing the newest available version: [INFO] ${jetty.version} . 6.1.26 [INFO] ${javacc-plugin.version} 2.6 [INFO] ${velocity.version} . 1.7 [INFO] ${exec-plugin.version} 1.2.1 [INFO] The following version property updates are available: [INFO] ${jackson.version} .. 1.8.8 - 1.9.11 [INFO] ${source-plugin.version} . 2.1.2 - 2.2.1 [INFO] ${jar-plugin.version} .. 2.3.2 - 2.4 [INFO] ${snappy.version} . 1.0.5 - 1.1.0-M3 [INFO] ${checkstyle-plugin.version} 2.8 - 2.10 [INFO] ${hadoop1.version} .. 0.20.205.0 - 1.1.2 [INFO] ${commons-compress.version} 1.4.1 - 1.5 [INFO] ${plugin-plugin.version} . 2.9 - 3.2 [INFO] ${javadoc-plugin.version} 2.8 - 2.9 [INFO] ${compiler-plugin.version} . 2.3.2 - 3.1 [INFO] ${jopt-simple.version} ... 4.1 - 4.4 [INFO] ${surefire-plugin.version} ... 2.12 - 2.14.1 [INFO] ${paranamer.version} ... 2.3 - 2.5.2 [INFO] ${netty.version} 3.4.0.Final - 4.0.0.Alpha8 [INFO] ${slf4j.version} . 1.6.4 - 1.7.5 [INFO] ${shade-plugin.version} .. 1.5 - 2.1 [INFO] ${junit.version} ... 4.10 - 4.11 {noformat} Consider upgrades for these as well as the Apache parent and build plugins. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders
[ https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659307#comment-13659307 ] Scott Carey commented on AVRO-1245: --- Another possibility is to do it all at builder construction time -- there is less to do when you are guaranteed a clean-slate, and the existing record must be walked at least once for a deep copy anyway. {code} boolean replaceNullsWithDefaults = true; boolean replaceEmptyStringsWithDefaults = true; User.newBuilder(thirdPartyRedord, replaceNullsWithDefaults, replaceEmptyStringsWithDefaults); {code} This bears resemblance to one of our other ideas -- that there is no 'read' and 'write', only 'from' and 'to' -- a deep copy is like serializing 'from' one object 'to' another (rather than to binary, etc). Replacing values is a special case of schema resolution when translating data from one object to another. Add Merging Functionality to Generated Builders --- Key: AVRO-1245 URL: https://issues.apache.org/jira/browse/AVRO-1245 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.3 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3 Reporter: Sharmarke Aden Priority: Minor Suppose I have a record with the following schema and default values: {code} { type: record, namespace: test, name: User, fields: [ { name: user, type: [null, string], default: null }, { name: privacy, type: [ { type: enum, name: Privacy, namespace: test, symbols: [Public, Private] }, null ], default: Private } ] } {code} Now suppose I have a record supplied to me by a third party whose privacy field value is null. Currently if you call Builder.newBuilder(thirdPartyRecord) it simply creates a new record with same values as the source record (privacy is null in the newly created builder). It's very important that the privacy value be set and so ideally I would like to perform a merge to mitigate any issues with default values being absent in the source record. I would like to propose that a new enhancement be added to the Builder to support merging of a source record to a new record. Perhaps something like this: {code} // recordWithoutDefaults record passed in. User.Builder builder = User.newBuilder(); //ignore null values in the source record if the schema has a default //value for the field boolean ignoreNull = true; //ignore empty string values in the source record for string field //types with default field values boolean ignoreEmptyString = true; //while this is simple and useful in my use-case perhaps there's a //better/refined way of supporting veracious merging models builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities
[ https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659893#comment-13659893 ] Scott Carey commented on AVRO-1315: --- Chrisophe: I'll add some factory methods for creating composite validators, or similar, to address your use case for composing custom validations with these. Tom: I'll add tests that explicitly test the positive case -- the positive cases are all covered now on the path to failure. Re: serialVersionUID -- any Serializable class without one can be considered a bug. Eclipse gives me a warning without it. Its unlikely that a user will use Java serialization for these exceptions, but there is a chance. Java: Schema Validation utilities - Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1315.patch As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659923#comment-13659923 ] Scott Carey commented on AVRO-1325: --- The field syntax could have shortcuts too -- since the FieldsBuilder currently has only two methods (name() and endRecord()), we could add a few shortcut methods for the most common use cases. {quote} Why have type() and type(Schema)? {quote} {code} Schema person = new Schema.Parser().parse(Person.avsc); // or lookup from schema repo Schema job = new SchemaBuilder.record(Job).fields() .name(title).type().stringType().noDefault() .name(who).type(person).noDefault() .endRecord(); Schema meeting = new SchemaBuilder.record(Meeting).fields() .name(location).type().stringType().noDefault() .name(attendees).type().array().items(person).noDefault() .endRecord(); {code} {quote}Error type is missing, but this can be easily added.{quote} I wondered what to do here, since Error is for Protocols only, perhaps add it when we add a ProtocolBuilder or extend SchemaBuilder to support protocols? Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659927#comment-13659927 ] Scott Carey commented on AVRO-1325: --- That brings up one more thing about this design: I had in mind later supporting Protocols, and the nested type-parameterized context (CompletionR) allows a ProtocolBuilder to nest a SchemaBuilder -- a TypeBuilderProtocol would share API and code. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities
[ https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659943#comment-13659943 ] Scott Carey commented on AVRO-1315: --- The issue is that the default value is different on different JVMs/compilers, and that we break compatibility if there are any changes, even if they don't have any effect -- such as adding an additional constructor. I felt that we would be more likely to cause the default value to change than to make a change that influenced serialization on such a trivial exception type. Low risk no matter which way we go; I'd rather focus our energies elsewhere. I'll remove it from the next patch. Java: Schema Validation utilities - Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1315.patch As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13659945#comment-13659945 ] Scott Carey commented on AVRO-1325: --- Another option would be to have record() and recordSimple() on this API -- the latter could return a record builder with simpler syntax but lacking support for some things. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1325: -- Attachment: AVRO-1325.patch Updated patch contains: * Completed functionality. * Cleaned up API -- ** intWith() now intBuilder() ** added nullable() and optional() shortcut builders ** reduce number of methods named 'type' for the field builder (default values are set afterwards) * Very large increase in javadoc * Unit test coverage at 99.9% instruction coverage Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1325: -- Status: Patch Available (was: Open) Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1325: -- Attachment: AVRO-1325-v2.patch This patch includes an additional 180 lines of javadoc introduction to SchemaBuilder. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13658758#comment-13658758 ] Scott Carey commented on AVRO-1325: --- My most recent patch deals with the annoying 'everything has a property' case as follows: {code} Schema schema = SchemaBuilder.record(Rec).prop(recProp, r).fields() .name(locations).prop(fieldProp, f).map().prop(mapProp,m).values() .stringBuilder().prop(valProp,v).endString() .endRecord() {code} The example from the first comment, based on the schema in the Avro spec page: {code} Schema schema = SchemaBuilder.record(HandshakeRequest).namespace(org.apache.avro.ipc).fields() .name(clientHash).type().fixed(MD5).size(16).noDefault() // namespace is inherited .name(clientProtocol).type().nullable().stringBuilder() // nullable() is union of type and null .prop(avro.java.string, String).endString().noDefault() .name(serverHash).type(MD5).noDefault() //reference by name .name(meta).type().optional().map().prop(avro.java.string, String).values().bytesType() // optional is union of null and type with null default .endRecord(); {code} Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325.patch, AVRO-1325-preliminary.patch, AVRO-1325-properties.patch, AVRO-1325-v2.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins
[ https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1314: -- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Java: Add @threadSafe annotation to maven plugins - Key: AVRO-1314 URL: https://issues.apache.org/jira/browse/AVRO-1314 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1314.patch Our plugins are thread-safe, mark them as much so that warnings will not be printed when running parallel maven builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins
[ https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1314: -- Resolution: Fixed Status: Resolved (was: Patch Available) I committed this @ r1483078 Java: Add @threadSafe annotation to maven plugins - Key: AVRO-1314 URL: https://issues.apache.org/jira/browse/AVRO-1314 Project: Avro Issue Type: Improvement Components: java Reporter: Scott Carey Assignee: Scott Carey Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1314.patch Our plugins are thread-safe, mark them as much so that warnings will not be printed when running parallel maven builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch
[ https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657175#comment-13657175 ] Scott Carey commented on AVRO-1310: --- Martin: Are you using M2E or maven:eclipse ? With M2E as long as eclipse is not rebuilding I can use the command line to do all but clean. Both Eclipse and maven share the same output for compiled class files, so they can step on each-others toes but it is manageable. This is the same experience I have with every maven project, Avro or otherwise, with M2E. I haven't used the maven eclipse plugin in years, so I don't know much about that. Avro Maven project can't be built from scratch -- Key: AVRO-1310 URL: https://issues.apache.org/jira/browse/AVRO-1310 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Environment: Maven on Eclipse Reporter: Nir Zamir Priority: Minor When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn install') there are errors. Most of the errors are in tests so I tried skipping the tests but it still fails. See more details in my post on Avro Users: http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654852#comment-13654852 ] Scott Carey commented on AVRO-1325: --- I'll need to clean up the doc a little w.r.t. complexity in a couple places. BaseTypeBuilder has three overloads for type(), FieldBuilder has three more due to defaults. What do you think would be confusing to the causal user? What could not be satisfied with javadoc? I expect users to have IDEs set up with Maven so that javadoc is available at autocomplete/suggestion. Are you concerned that will not be the case for most users? We could rename the string name reference variants for FieldBuilder's to typeReference(). I named all of them type() because when writing the doc, it was easy to say: {code} /** * Builds a Field in the context of a {@link FieldAssembler}. * * Usage is to first configure any of the optional parameters and then to call one * of the type methods to complete the field. For example * pre * .namespace(org.apache.example).orderDecending().type() * /pre * Optional parameters for a field are namespace, doc, order, and aliases. */ {code} We could change the name of the ones that select a name by reference to typeRef, or remove the ones that select default values and instead force the user to call an additional noDefaults() or withDefault() method afterwards to reduce the number of methods named 'type'. For a generic type builder, used by map and array, values() and items() returns a builder that has three variants of type() on it, these could be rolled in to the map and array instead: {code} map().values().intType() map().values().type(MD5) map().values().type(someSchema) {code} we could have: {code} map().values().intType() map().values(MD5) map().values(someSchema) {code} I did not do this because it shared more code and consistency -- SchemaBuilder itself has the same API. {quote} What is the difference between intWith() and intType() {quote} I need feedback/suggestions on naming and API here. The javadoc for intType would say select an int type without custom properties. A shortcut for intWith().endInt(). The javadoc for intWith would say return an int type builder for creating an int with properties. If properties are not required use the #intType() shortcut. intWith() exists only for the case where you need to add a property to the int, which is uncommon, so I wanted a shortcut for the common case. I did not want to have the context for optional properties to bleed into the following context after type selection, since that adds extra methods to the later context and applies to doc() and namespace() as well. After selecting a type, the context either returns to an earlier scope or to a field default selection. In the former case, properties are ambiguous with the outer scope (a field, array, map, or record context). In the latter case, having a prop() method in scope is not applicable to default value selection. I decided to have the methods available in any scope be unambiguous and correspond with the JSON declaration and the spec. This reduces how many methods are available in all contexts significantly from the current version in trunk and is intended to prevent user error. Alternatively we could name intWith() to intBuilder(), or intWithProps(), or change intType() to be the builder, and have intSimple() for the shortcut, but I wanted the common case to be the most obvious. I think the naming and documentation here could certainly be improved. Making it clear to a user which intXYZ (and the similar methods for all other primitive types) is critical. The vast majority of the time users will not need to set properties on primitive types. {quote} I think this is easy to add to the existing SchemaBuilder by adding an addProp(String key, String value) method to the builders (RecordBuilder, FieldBuilder, ArrayBuilder). For FieldBuilder we could also have a addPropToType(String key, String value) method to distinguish between properties that are added to the underlying type, not the field itself. {quote} It is complicated and confusing to scope properties correctly. The context for setting a property is easily ambiguous in all cases where it is not contained in its own scope. For example: {code} {type:record, name:Rec, recProp:r, fields: [ {name:locations, fieldProp:f, type: {type:map, mapProp:m, values:{type:string, valProp:v}}} ]} {code} What would this look like with the API in trunk now if extended to support properties? I had trouble reasoning about making it obvious to the user when the field's type contexts bled into the record context. If you add the ability to chain the builder to build a nested type in the map, it gets even more messy. The scoping and nesting allows for chaining, and thus propagation of default
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654909#comment-13654909 ] Scott Carey commented on AVRO-1325: --- {quote} What is the difference between intWith() and intType() {quote} Dealing with properties or not could be as follows for primitive types. JSON: {code} {type:map, values:{type:string, avro.java.string:String}} {type:map, values:{type:string}} // same as {type:map, values:string} {code} trunk: {code} Schema strWithProps = Schema.create(Schema.Type.STRING); strWithProps.addProp(avro.java.string, String); SchemaBuilder.mapType(strWithProps); SchemaBiulder.mapType(SchemaBuilder.STRING); {code} Current proposal: {code} SchemaBuilder.map().values().stringWith().prop(avro.java.string).endInt(); SchemaBuilder.map().values().stringType(); {code} Alternative: {code} SchemaBuilder.map().values().stringType().prop(avro.java.string, String).endString(); SchemaBuilder.map().values().stringType().endString(); {code} The benefit of the alternative is fewer methods on the type builder -- only one for each primitive type. The drawback is the common case -- no props -- requires another method call to close the property setting context. Another alternative is to significantly increase the number of methods available on the map for setting values to shorten the common case: {code} SchemaBuilder.map().values().stringType().prop(avro.java.string, String).endString(); SchemaBuilder.map().valuesString(); // looks a lot like {type:map, values:string} {code} 'valuesString()' is shortcut for 'values().stringType().endString()', and it also looks a lot like the json. I did not like this variation because it adds a lot of methods to the array and map cases and makes their APIs differ more. It is also less consitent with unions, and I wanted to have unions, maps, arrays, and fields be similar in API look/feel where possible. Fewer public methods are also easier to document. :) Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325-preliminary.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas
[ https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13653062#comment-13653062 ] Scott Carey commented on AVRO-1316: --- We could make it so that 1.7.4 code can read classes generated with 1.7.5. If the method that takes the split strings and merges them into one with the string buffer before parsing is inside the generated class rather than Schema.Parser, the change would be two-way compatible. This is not quite as elegant however, and I think the requirement to run code generated by 1.7.5 with 1.7.5 is reasonable. IDL code-generation generates too-long literals for very large schemas -- Key: AVRO-1316 URL: https://issues.apache.org/jira/browse/AVRO-1316 Project: Avro Issue Type: Bug Components: java Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Labels: patch Fix For: 1.7.5 Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch When I work from a very large IDL schema, the Java code generated includes a schema JSON literal that exceeds the length of the maximum allowed literal string ([65535 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]). This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] constant string too long}}. It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous at all for some of the more involved data structures, especially if we're including documentation strings etc. I believe the fix should be a bit more sensitivity to the length of the JSON literal (and a willingness to split it into more than one literal, joined by {{+}}), but I haven't figured out where that change needs to go. Has anyone else encountered this problem? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas
[ https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651254#comment-13651254 ] Scott Carey commented on AVRO-1316: --- Looks good. +1 IDL code-generation generates too-long literals for very large schemas -- Key: AVRO-1316 URL: https://issues.apache.org/jira/browse/AVRO-1316 Project: Avro Issue Type: Bug Components: java Reporter: Jeremy Kahn Assignee: Jeremy Kahn Priority: Minor Labels: patch Fix For: 1.7.5 Attachments: AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch, AVRO-1316.patch When I work from a very large IDL schema, the Java code generated includes a schema JSON literal that exceeds the length of the maximum allowed literal string ([65535 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]). This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] constant string too long}}. It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous at all for some of the more involved data structures, especially if we're including documentation strings etc. I believe the fix should be a bit more sensitivity to the length of the JSON literal (and a willingness to split it into more than one literal, joined by {{+}}), but I haven't figured out where that change needs to go. Has anyone else encountered this problem? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1325) Enhanced Schema Builder API
Scott Carey created AVRO-1325: - Summary: Enhanced Schema Builder API Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651288#comment-13651288 ] Scott Carey commented on AVRO-1325: --- Below are the limitations that concern me from AVRO-1274, in approximate priority of my concern. # Arbitrary properties are not supported, for example {type:string, avro.java.string:String} can not be built. # SchemaBuilder.INT and other constants are public. Unfortunately, these are mutable, and anyone could call addProp() on these, affecting others. # Scopes are confusing, it is not always obvious when a # Does not chain to nested types. Although there is limited chaining for record fields, nested calls to the builder are required which prevents supporting namespace nesting or other passing of context from outer to inner scopes. I have a prototype patch that builds on the work in AVRO-1274. The major changes are to how scopes are handled for fields and unions, since adding property support is not trivial on top of AVRO-1274 because there is much ambiguity in what a call to add a property would apply to (the field, or the type of the field?) The following schema: {code:json} {type:record,name:HandshakeRequest,namespace:org.apache.avro.ipc,fields:[ {name:clientHash,type:{type:fixed,name:MD5,size:16}}, {name:clientProtocol,type:[ null, {type:string,avro.java.string:String}]}, {name:serverHash,type:MD5}, {name:meta,type:[ null, {type:map,values:bytes,avro.java.string:String}]} ]} {code} looks like this in the builder: {code} Schema result = SchemaBuilder .recordType(HandshakeRequest).namespace(org.apache.avro.ipc).fields() .name(clientHash).type().fixed(MD5).size(16).noDefault() .name(clientProtocol).type().unionOf() .nullType().and() .stringWith().prop(avro.java.string, String).endString().endUnion().noDefault() .name(serverHash).type(MD5) .name(meta).type().unionOf() .nullType().and() .map().prop(avro.java.string, String).values().bytesType().endUnion().withDefault(null) .record(); {code} It supports the same feature set that JSON schemas do: * nesting of namespaces (MD5 above automatically picks up the org.apache.avro.ipc namespace) * reference of named types by name .type(MD5) above for serverHash And enforces other rules: * union defaults are required to be the same as the first type in the union * properties, doc(), namespace, and aliases work only in the contexts that they are supported. Supported features are scoped with many internal nested types, for example, the field assembler returned by the record builder's fields() method has only two methods -- name(String) and record(), and the type builder that name(String) returns type builder for a field, which has prop(String, String) for the field and the available types, such as map(). A call to map() returns a map builder, which has prop(String, String) again but for the map, and values() ends the use of the map builder, changing scope to the nested type and returning down to the fields assembler when that is complete. h4. Remaining Work * All primitive types are not supported yet (trivial) * Shortcut methods need to be added for common use cases such as an optional field. * Naming of some things needs review -- it would be easier if enum, int, long, default, etc were not reserved java key words :) * Javadoc is nearly absent. * There is some room for pushing more common work into parent types. * Tests * Attempt to replace the Schema.Parser logic with it, at minimum to test for areas of improvement or missing features. * No protocol support yet (e.g. error, protocol, request, response). It probably makes sense to extend this to cover all Avro things, including fields and protocols. I want to checkpoint the work so far and gather feedback. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13651288#comment-13651288 ] Scott Carey edited comment on AVRO-1325 at 5/7/13 9:10 PM: --- Below are the limitations that concern me from AVRO-1274, in approximate priority of my concern. # Arbitrary properties are not supported, for example {type:string, avro.java.string:String} can not be built. # SchemaBuilder.INT and other constants are public. Unfortunately, these are mutable, and anyone could call addProp() on these, affecting others. # Scopes are confusing, it is not always obvious when a # Does not chain to nested types. Although there is limited chaining for record fields, nested calls to the builder are required which prevents supporting namespace nesting or other passing of context from outer to inner scopes. I have a prototype patch that builds on the work in AVRO-1274. The major changes are to how scopes are handled for fields and unions, since adding property support is not trivial on top of AVRO-1274 because there is much ambiguity in what a call to add a property would apply to (the field, or the type of the field?) The following schema: {code} {type:record,name:HandshakeRequest,namespace:org.apache.avro.ipc,fields:[ {name:clientHash,type:{type:fixed,name:MD5,size:16}}, {name:clientProtocol,type:[ null, {type:string,avro.java.string:String}]}, {name:serverHash,type:MD5}, {name:meta,type:[ null, {type:map,values:bytes,avro.java.string:String}]} ]} {code} looks like this in the builder: {code} Schema result = SchemaBuilder .recordType(HandshakeRequest).namespace(org.apache.avro.ipc).fields() .name(clientHash).type().fixed(MD5).size(16).noDefault() .name(clientProtocol).type().unionOf() .nullType().and() .stringWith().prop(avro.java.string, String).endString().endUnion().noDefault() .name(serverHash).type(MD5) .name(meta).type().unionOf() .nullType().and() .map().prop(avro.java.string, String).values().bytesType().endUnion().withDefault(null) .record(); {code} It supports the same feature set that JSON schemas do: * nesting of namespaces (MD5 above automatically picks up the org.apache.avro.ipc namespace) * reference of named types by name .type(MD5) above for serverHash And enforces other rules: * union defaults are required to be the same as the first type in the union * properties, doc(), namespace, and aliases work only in the contexts that they are supported. Supported features are scoped with many internal nested types, for example, the field assembler returned by the record builder's fields() method has only two methods -- name(String) and record(), and the type builder that name(String) returns type builder for a field, which has prop(String, String) for the field and the available types, such as map(). A call to map() returns a map builder, which has prop(String, String) again but for the map, and values() ends the use of the map builder, changing scope to the nested type and returning down to the fields assembler when that is complete. h4. Remaining Work * All primitive types are not supported yet (trivial) * Shortcut methods need to be added for common use cases such as an optional field. * Naming of some things needs review -- it would be easier if enum, int, long, default, etc were not reserved java key words :) * Javadoc is nearly absent. * There is some room for pushing more common work into parent types. * Tests * Attempt to replace the Schema.Parser logic with it, at minimum to test for areas of improvement or missing features. * No protocol support yet (e.g. error, protocol, request, response). It probably makes sense to extend this to cover all Avro things, including fields and protocols. I want to checkpoint the work so far and gather feedback. was (Author: scott_carey): Below are the limitations that concern me from AVRO-1274, in approximate priority of my concern. # Arbitrary properties are not supported, for example {type:string, avro.java.string:String} can not be built. # SchemaBuilder.INT and other constants are public. Unfortunately, these are mutable, and anyone could call addProp() on these, affecting others. # Scopes are confusing, it is not always obvious when a # Does not chain to nested types. Although there is limited chaining for record fields, nested calls to the builder are required which prevents supporting namespace nesting or other passing of context from outer to inner scopes. I have a prototype patch that builds on the work in AVRO-1274. The major changes are to how scopes are handled for fields and unions, since adding property support is not trivial on top of AVRO-1274 because there is much ambiguity in what a call to add a property would apply to
[jira] [Updated] (AVRO-1325) Enhanced Schema Builder API
[ https://issues.apache.org/jira/browse/AVRO-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1325: -- Attachment: AVRO-1325-preliminary.patch Preliminary work in progress -- mostly complete but requires more doc, tests, and feedback. Enhanced Schema Builder API --- Key: AVRO-1325 URL: https://issues.apache.org/jira/browse/AVRO-1325 Project: Avro Issue Type: Bug Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1325-preliminary.patch The schema builder from AVRO-1274 has a few key limitations. I have proposed changes to make before it is released and the public API is locked in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1274) Add a schema builder API
[ https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647360#comment-13647360 ] Scott Carey commented on AVRO-1274: --- I am working on a modification to the builder that would make its use look like a json schema. {code} public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse( {\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[ {\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}}, {\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]}, {\name\:\serverHash\,\type\:\MD5\}, {\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]} ]}); {code} becomes similar to: {code} public static final org.apache.avro.Schema SCHEMA$ = SchemaBuilder .typeRecord(HandshakeRequest).namespaceInherited(org.apache.avro.ipc).fields()// optional namespace inheritance .typeFixed(clientHash, MD5.SCHEMA$).field() // or typeFixed(clientHash, MD5, 16) .typeUnion(clientProtocol).ofNull().andString().withProp(avro.java.string, String).field() .typeFixed(serverHash, MD5).field() // uses reference to already defined MD5 .typeUnion(meta).ofNull().andMap().withProp(avro.java.string, String).valuesBytes().field() .record(); {code} we can also have shortcuts as before, for example optionalInt(x, -1) as a shortcut for typeUnion(x).ofInt(-1).andNull() nullableInt(maybe) as a shortcut for typeUnion(maybe).ofNull(null).andInt() requiredInt(yes) may not be necessary, its shortcut would be typeInt(yes).field(); It should be straightforward to implement the whole Schema.Parser with the above (and simplify the parser), which makes it easy to test very thoroughly; there is an intentional 1:1 mapping between the parser, spec, and the builder. Add a schema builder API Key: AVRO-1274 URL: https://issues.apache.org/jira/browse/AVRO-1274 Project: Avro Issue Type: New Feature Components: java Reporter: Tom White Assignee: Tom White Fix For: 1.7.5 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, TestDefaults.patch It would be nice to have a fluent API that made it easier to construct record schemas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas
[ https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648047#comment-13648047 ] Scott Carey commented on AVRO-1316: --- The limit in the (Sun) java compiler is 64KB in encoded UTF-8 bytes, not 64K characters. If any identifier is a multibyte utf8 character breaking at that boundary will fail. We probably want to break at a smaller boundary than 2^16. How about 2^14 (16KB)? IDL code-generation generates too-long literals for very large schemas -- Key: AVRO-1316 URL: https://issues.apache.org/jira/browse/AVRO-1316 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.5 Reporter: Jeremy Kahn Priority: Minor Labels: patch Attachments: AVRO-1316.patch When I work from a very large IDL schema, the Java code generated includes a schema JSON literal that exceeds the length of the maximum allowed literal string ([65535 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]). This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] constant string too long}}. It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous at all for some of the more involved data structures, especially if we're including documentation strings etc. I believe the fix should be a bit more sensitivity to the length of the JSON literal (and a willingness to split it into more than one literal, joined by {{+}}), but I haven't figured out where that change needs to go. Has anyone else encountered this problem? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1274) Add a schema builder API
[ https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648048#comment-13648048 ] Scott Carey commented on AVRO-1274: --- I am planning on constraining the lexical scope via many cascaded builders / assemblers so that the list to auto-complete at any time is small. I'll make a new JIRA for my proposed changes. Add a schema builder API Key: AVRO-1274 URL: https://issues.apache.org/jira/browse/AVRO-1274 Project: Avro Issue Type: New Feature Components: java Reporter: Tom White Assignee: Tom White Fix For: 1.7.5 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, TestDefaults.patch It would be nice to have a fluent API that made it easier to construct record schemas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7
[ https://issues.apache.org/jira/browse/AVRO-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646760#comment-13646760 ] Scott Carey commented on AVRO-1311: --- 1.0.5 may be released this week (based on -M4 see: https://github.com/xerial/snappy-java/issues/6) but we may want to test it more first. 1.0.5-M4 works for me (OSX 10.7.5, Java 7). Can some others test changing the snappy.version in lang/java/pom.xml to 1.0.5-M4? {code:xml} snappy.version1.0.5-M4/snappy.version {code} Upgrade Snappy-Java dependency to support building on Mac + Java 7 -- Key: AVRO-1311 URL: https://issues.apache.org/jira/browse/AVRO-1311 Project: Avro Issue Type: Bug Affects Versions: 1.7.4 Reporter: Scott Carey Assignee: Scott Carey snappy-java 1.0.4 does not work with Mac + Java 7. 1.0.5-M4 is on maven, but it does not appear that there will be a final release of that. 1.1.0 is at -M3 status, and is being developed now. Both of these work locally for me, when the dust settles we need to pick one before the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe
[ https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647037#comment-13647037 ] Scott Carey commented on AVRO-607: -- If I recall, that turns out to be very hard due to how the equals contract works with weak references. There is already a Java WeakHashMap, so making one with identity semantics wasn't too hard. We may need thousands of lines of code and might have to implement our own concurrent map implementation. I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar. SpecificData.getSchema not thread-safe -- Key: AVRO-607 URL: https://issues.apache.org/jira/browse/AVRO-607 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.3 Reporter: Stephen Tu Priority: Minor Attachments: AVRO-607.patch SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe, and the method itself is not synchronized. Seems like this could lead to the data structure getting corrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch
[ https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647146#comment-13647146 ] Scott Carey commented on AVRO-1310: --- Thanks for the update! Yes, the maven archetypes at the end have some extra dependencies. Lets not close this quite yet, but I'll lower the priority -- I won't worry about it for the next release. There may be something worth fixing in the archetype part of the build. If not, then we can close this. Avro Maven project can't be built from scratch -- Key: AVRO-1310 URL: https://issues.apache.org/jira/browse/AVRO-1310 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Environment: Maven on Eclipse Reporter: Nir Zamir When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn install') there are errors. Most of the errors are in tests so I tried skipping the tests but it still fails. See more details in my post on Avro Users: http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1310) Avro Maven project can't be built from scratch
[ https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1310: -- Priority: Minor (was: Major) Avro Maven project can't be built from scratch -- Key: AVRO-1310 URL: https://issues.apache.org/jira/browse/AVRO-1310 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Environment: Maven on Eclipse Reporter: Nir Zamir Priority: Minor When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn install') there are errors. Most of the errors are in tests so I tried skipping the tests but it still fails. See more details in my post on Avro Users: http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed in revision 1478244. Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1274) Add a schema builder API
[ https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647196#comment-13647196 ] Scott Carey commented on AVRO-1274: --- We may have more work to do here. How would you use the builder to do the equivalent of: {code} public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse( {\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[ {\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}}, {\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]}, {\name\:\serverHash\,\type\:\MD5\}, {\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]} ]}); {code} ? I am trying to suggest that we replace literal strings with the builder in AVRO-1316 but cannot seem to repliate the above with the builder. The clientProtocol and meta fields are the problem. It does not seem possible to create a union of null and 'more' without a default. Additionally, unionType is confusing. Is this how it would be done? If so, I do not see how to add types to the union if I start with: {code} unionType(clientProtocol, SchemaBuilder.NULL) {code} Then how do I add extra types? Or is the type passed in expected to _be_ a union? if so the field should be named unionSchema and the javadoc needs to be clear. This builder API makes it hard to create union fields without defaults. Perhaps it is simply a documentation issue and the doc for unionType() needs an example. Should we open a new ticket for these concerns or re-open this one? I suspect it is largely documentation but am not sure. Add a schema builder API Key: AVRO-1274 URL: https://issues.apache.org/jira/browse/AVRO-1274 Project: Avro Issue Type: New Feature Components: java Reporter: Tom White Assignee: Tom White Fix For: 1.7.5 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, TestDefaults.patch It would be nice to have a fluent API that made it easier to construct record schemas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1316) IDL code-generation generates too-long literals for very large schemas
[ https://issues.apache.org/jira/browse/AVRO-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647205#comment-13647205 ] Scott Carey commented on AVRO-1316: --- I have not, but my schemas are only ~12K. I assume the problem is in the creation of the SCHEMA$ static field? We could break the string up into 4k chunks. However it will be more efficient and significantly less resulting class file size if we use the Schema API programatically. This isn't too hard. We go from the below (edited from one line to many for readability): {code} public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse( {\type\:\record\,\name\:\HandshakeRequest\,\namespace\:\org.apache.avro.ipc\,\fields\:[ {\name\:\clientHash\,\type\:{\type\:\fixed\,\name\:\MD5\,\size\:16}}, {\name\:\clientProtocol\,\type\:[\null\,{\type\:\string\,\avro.java.string\:\String\}]}, {\name\:\serverHash\,\type\:\MD5\}, {\name\:\meta\,\type\:[\null\,{\type\:\map\,\values\:\bytes\,\avro.java.string\:\String\}]} ]}); {code} to use the new SchemaBuilder: {code} public static final org.apache.avro.Schema SCHEMA$; static { SCHEMA$ = SchemaBuilder .recordType(HandshakeRequest) .namespace(org.apache.avro.ipc) .requiredFixed(clientHash, MD5.SCHEMA$) .unionType(clientProtocol, SchemaBuilder.unionType( SchemaBuilder.NULL, SchemaBuilder.STRING) .build()) .addProp(avro.java.string, String) .requiredFixed(serverHash, MD5.SCHEMA$) .unionType(meta, SchemaBuilder.unionType( SchemaBuilder.NULL, SchemaBuilder.mapType(SchemaBuilder.BYTES) .addProp(avro.java.string, String) .build()) .build()) .build(); } {code} IDL code-generation generates too-long literals for very large schemas -- Key: AVRO-1316 URL: https://issues.apache.org/jira/browse/AVRO-1316 Project: Avro Issue Type: Bug Components: java Reporter: Jeremy Kahn Priority: Minor When I work from a very large IDL schema, the Java code generated includes a schema JSON literal that exceeds the length of the maximum allowed literal string ([65535 characters|http://stackoverflow.com/questions/8323082/size-of-initialisation-string-in-java]). This creates weird Maven errors like: {{[ERROR] ...FooProtocol.java:[13,89] constant string too long}}. It might seem a little crazy, but a 64-kilobyte JSON protocol isn't outrageous at all for some of the more involved data structures, especially if we're including documentation strings etc. I believe the fix should be a bit more sensitivity to the length of the JSON literal (and a willingness to split it into more than one literal, joined by {{+}}), but I haven't figured out where that change needs to go. Has anyone else encountered this problem? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1274) Add a schema builder API
[ https://issues.apache.org/jira/browse/AVRO-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647221#comment-13647221 ] Scott Carey commented on AVRO-1274: --- I think the answer to my question would be: {code} public static final org.apache.avro.Schema SCHEMA$; static { SCHEMA$ = SchemaBuilder .recordType(HandshakeRequest) .namespace(org.apache.avro.ipc) .requiredFixed(clientHash, MD5.SCHEMA$) .unionType(clientProtocol, SchemaBuilder.unionType( SchemaBuilder.NULL, SchemaBuilder.STRING) .build()) .addFieldProp(avro.java.string, String) .requiredFixed(serverHash, MD5.SCHEMA$) .unionType(meta, SchemaBuilder.unionType( SchemaBuilder.NULL, SchemaBuilder.mapType(SchemaBuilder.BYTES) .addFieldProp(avro.java.string, String) .build()) .build()) .build(); } {code} but I am not sure. Also addFieldProp() does not exist. What is odd is that there are two unionType() methods, one takes varargs and the other does not. I suspect that the intention was for both to use varargs so that the nested union building is not required by the user. It would be much simpler if unions without defaults had a shortcut: {code} public static final org.apache.avro.Schema SCHEMA$; static { SCHEMA$ = SchemaBuilder .recordType(HandshakeRequest) .namespace(org.apache.avro.ipc) .requiredFixed(clientHash, MD5.SCHEMA$) .nullableString(clientProtocol) .addFieldProp(avro.java.string, String) .requiredFixed(serverHash, MD5.SCHEMA$) .nullableMap(SchemaBuilder.BYTES) .addFieldProp(avro.java.string, String) .build() } {code} Building unions in general feels clunky as well since you have to break chaining and use SchemaBuilder again. Instead of taking a varargs list of schemas in the union, the type returned could be a UnionBuilder. So instead of: {code} public static final org.apache.avro.Schema SCHEMA$; static { SCHEMA$ = SchemaBuilder .recordType(Test) .namespace(org.apache.avro) .unionString(stringField, defaultVal, SchemaBuilder.INT, SchemaBuilder.arrayType(SchemaBuilder.INT).build() SchemaBuilder.mapType(SchemaBuilder.unionType( SchemaBuilder.INT, SchemaBuilderLONG) ) ) .build() } {code} we could write something more like: {code} public static final org.apache.avro.Schema SCHEMA$; static { SCHEMA$ = SchemaBuilder .recordType(Test) .namespace(org.apache.avro) .unionString(stringFieldName, defaultVal) .andInt() .andArrayOf().int() .andMapOf().unionInt().andLong() .build() } {code} Add a schema builder API Key: AVRO-1274 URL: https://issues.apache.org/jira/browse/AVRO-1274 Project: Avro Issue Type: New Feature Components: java Reporter: Tom White Assignee: Tom White Fix For: 1.7.5 Attachments: AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, AVRO-1274.patch, TestDefaults.patch It would be nice to have a fluent API that made it easier to construct record schemas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1282: -- Resolution: Fixed Fix Version/s: 1.8.0 1.7.5 Status: Resolved (was: Patch Available) Committed @r1477712 Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Fix For: 1.7.5, 1.8.0 Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, AVRO-1282-s7.patch, avro-1282-v1.patch, avro-1282-v2.patch, avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, AVRO-1282-v9.patch, TestUnsafeUtil.java Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Fix Version/s: 1.7.5 Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Bug Reporter: Scott Carey Fix For: 1.7.5 We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Issue Type: Improvement (was: Bug) Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Fix For: 1.7.5 We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Attachment: AVRO-1313.patch This patch adds the check for avro.disable.unsafe. When -Davro.disable.unsafe is added to the command line, performance drops as expected for field access, but array performance is still fast: Unsafe On: {noformat} test name timeM entries/sec M bytes/sec bytes/cycle ReflectRecordRead: 7405 ms 2.25187.343 808498 ReflectRecordWrite: 4786 ms 3.482 135.121 808498 ReflectBigRecordRead: 7478 ms 1.33782.089 767380 ReflectBigRecordWrite: 4984 ms 2.006 123.153 767380 ReflectFloatRead: 6927 ms 0.000 115.486 104 ReflectFloatWrite: 1087 ms 0.001 735.371 104 ReflectDoubleRead: 8678 ms 0.000 184.369 204 ReflectDoubleWrite: 2398 ms 0.000 666.980 204 ReflectIntArrayRead: 11756 ms 1.41858.503 859709 ReflectIntArrayWrite: 3798 ms 4.388 181.070 859709 ReflectLongArrayRead: 6542 ms 1.27498.481 805344 ReflectLongArrayWrite: 2189 ms 3.806 294.278 805344 ReflectDoubleArrayRead: 6316 ms 1.583 103.625 818144 ReflectDoubleArrayWrite: 1589 ms 6.292 411.827 818144 ReflectFloatArrayRead: 13986 ms 1.43048.400 846172 ReflectFloatArrayWrite: 2953 ms 6.771 229.186 846172 ReflectNestedFloatArrayRead: 16618 ms 1.20340.733 846172 ReflectNestedFloatArrayWrite: 4841 ms 4.131 139.820 846172 ReflectNestedObjectArrayRead: 12905 ms 0.31039.989 645104 ReflectNestedObjectArrayWrite: 6868 ms 0.58275.139 645104 ReflectNestedLargeFloatArrayRead: 10141 ms 0.32985.781 1087381 ReflectNestedLargeFloatArrayWrite: 2049 ms 1.626 424.432 1087381 ReflectNestedLargeFloatArrayBlockedRead: 10501 ms 0.31783.899 1101357 ReflectNestedLargeFloatArrayBlockedWrite: 5554 ms 0.600 158.634 1101357 {noformat} Unsafe Off: {noformat} test name timeM entries/sec M bytes/sec bytes/cycle ReflectRecordRead: 13282 ms 1.25548.694 808498 ReflectRecordWrite: 8981 ms 1.85672.011 808498 ReflectBigRecordRead: 17118 ms 0.58435.863 767380 ReflectBigRecordWrite: 13178 ms 0.75946.584 767380 ReflectFloatRead: 6713 ms 0.000 119.160 104 ReflectFloatWrite: 2444 ms 0.000 327.229 104 ReflectDoubleRead: 8094 ms 0.000 197.677 204 ReflectDoubleWrite: 2133 ms 0.000 749.844 204 ReflectIntArrayRead: 12127 ms 1.37456.712 859709 ReflectIntArrayWrite: 3832 ms 4.349 179.463 859709 ReflectLongArrayRead: 6312 ms 1.320 102.059 805344 ReflectLongArrayWrite: 2548 ms 3.269 252.785 805344 ReflectDoubleArrayRead: 7460 ms 1.34087.726 818144 ReflectDoubleArrayWrite: 2048 ms 4.882 319.526 818144 ReflectFloatArrayRead: 11761 ms 1.70057.554 846172 ReflectFloatArrayWrite: 3370 ms 5.935 200.871 846172 ReflectNestedFloatArrayRead: 15946 ms 1.25442.450 846172 ReflectNestedFloatArrayWrite: 6429 ms 3.111 105.291 846172 ReflectNestedObjectArrayRead: 17478 ms 0.22929.527 645104 ReflectNestedObjectArrayWrite: 12148 ms 0.32942.480 645104 ReflectNestedLargeFloatArrayRead: 9012 ms 0.37096.524 1087381
[jira] [Commented] (AVRO-1218) Avro 1.7.3 fails to build
[ https://issues.apache.org/jira/browse/AVRO-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645753#comment-13645753 ] Scott Carey commented on AVRO-1218: --- Is this still an issue? I can build on OSX if I update snappy-java (see AVRO-1311). Avro 1.7.3 fails to build -- Key: AVRO-1218 URL: https://issues.apache.org/jira/browse/AVRO-1218 Project: Avro Issue Type: Bug Components: build Affects Versions: 1.7.3 Environment: OS X 10.8.2 Reporter: Russell Jurney Priority: Blocker Labels: avro, build, for, pig, piggybank, wont Attachments: build.log I am trying to build Avro 1.7.3 from source as a workaround for issues in PIG-3015. It does not build :( Errors attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1261) Honor schema defaults with the Constructor in addition to the builders.
[ https://issues.apache.org/jira/browse/AVRO-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645789#comment-13645789 ] Scott Carey commented on AVRO-1261: --- +1 Honor schema defaults with the Constructor in addition to the builders. --- Key: AVRO-1261 URL: https://issues.apache.org/jira/browse/AVRO-1261 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Reporter: Christopher Conner Assignee: Doug Cutting Priority: Minor Fix For: 1.7.5 Attachments: AVRO-1261.patch As I understand it, currently if you want to utilize defaults in a schema, ie: { namespace: com.chris.test, type: record, name: CHRISTEST, doc: Chris Test, fields: [ {name: firstname, type: string, default: Chris}, {name: lastname, type: string, default: Conner}, {name: username, type: string, default: cconner} ] } Then I have to use the builders to create my objects. IE: public class ChrisAvroTest { public static void main(String[] args) throws Exception { CHRISTEST person = CHRISTEST.newBuilder() .build(); System.out.println(person: + person); } } Is my understanding correct? Is it possible to make it so the default constructor as well? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders
[ https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645792#comment-13645792 ] Scott Carey commented on AVRO-1245: --- What about a more fluent API? {code} User.newBuilder(thirdPartyRecord).replaceNullsWithDefaults().replaceEmptyStringsWithDefaults(); {code} Add Merging Functionality to Generated Builders --- Key: AVRO-1245 URL: https://issues.apache.org/jira/browse/AVRO-1245 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.3 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3 Reporter: Sharmarke Aden Priority: Minor Suppose I have a record with the following schema and default values: {code} { type: record, namespace: test, name: User, fields: [ { name: user, type: [null, string], default: null }, { name: privacy, type: [ { type: enum, name: Privacy, namespace: test, symbols: [Public, Private] }, null ], default: Private } ] } {code} Now suppose I have a record supplied to me by a third party whose privacy field value is null. Currently if you call Builder.newBuilder(thirdPartyRecord) it simply creates a new record with same values as the source record (privacy is null in the newly created builder). It's very important that the privacy value be set and so ideally I would like to perform a merge to mitigate any issues with default values being absent in the source record. I would like to propose that a new enhancement be added to the Builder to support merging of a source record to a new record. Perhaps something like this: {code} // recordWithoutDefaults record passed in. User.Builder builder = User.newBuilder(); //ignore null values in the source record if the schema has a default //value for the field boolean ignoreNull = true; //ignore empty string values in the source record for string field //types with default field values boolean ignoreEmptyString = true; //while this is simple and useful in my use-case perhaps there's a //better/refined way of supporting veracious merging models builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1245) Add Merging Functionality to Generated Builders
[ https://issues.apache.org/jira/browse/AVRO-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645795#comment-13645795 ] Scott Carey commented on AVRO-1245: --- I suppose the merge idea would have better performance. Add Merging Functionality to Generated Builders --- Key: AVRO-1245 URL: https://issues.apache.org/jira/browse/AVRO-1245 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.3 Environment: Linux Mint 32-bit, Java 7, Avro 1.7.3 Reporter: Sharmarke Aden Priority: Minor Suppose I have a record with the following schema and default values: {code} { type: record, namespace: test, name: User, fields: [ { name: user, type: [null, string], default: null }, { name: privacy, type: [ { type: enum, name: Privacy, namespace: test, symbols: [Public, Private] }, null ], default: Private } ] } {code} Now suppose I have a record supplied to me by a third party whose privacy field value is null. Currently if you call Builder.newBuilder(thirdPartyRecord) it simply creates a new record with same values as the source record (privacy is null in the newly created builder). It's very important that the privacy value be set and so ideally I would like to perform a merge to mitigate any issues with default values being absent in the source record. I would like to propose that a new enhancement be added to the Builder to support merging of a source record to a new record. Perhaps something like this: {code} // recordWithoutDefaults record passed in. User.Builder builder = User.newBuilder(); //ignore null values in the source record if the schema has a default //value for the field boolean ignoreNull = true; //ignore empty string values in the source record for string field //types with default field values boolean ignoreEmptyString = true; //while this is simple and useful in my use-case perhaps there's a //better/refined way of supporting veracious merging models builder.merge(recordWithoutDefaults, ignoreNull, ignoreEmptyString); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe
[ https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645811#comment-13645811 ] Scott Carey commented on AVRO-607: -- I quite like Guava, Having a concurrent weak hash map is great, and the Immutable collections are very useful, and several other collection types are massive time savers (Multiset, Multimap and BiMap). However, items get deprecated and dissapear in 2 years in Guava, so we would have to avoid the newest APIs and quickly move off of deprecated ones to prevent users who also use it from coming into conflict. It is manageable, but it is a dependency that is very likely to be used by our users, and if we are on version 11 while a user is on 13, we could be in a position where neither version works for both of us simultaneously. I also worry about our place as a library far down the stack for some users. We could complicate our build to shade in only the classes we use under a different namespace to avoid such problems (this may be useful for other dependencies as well). SpecificData.getSchema not thread-safe -- Key: AVRO-607 URL: https://issues.apache.org/jira/browse/AVRO-607 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.3 Reporter: Stephen Tu Priority: Minor Attachments: AVRO-607.patch SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe, and the method itself is not synchronized. Seems like this could lead to the data structure getting corrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe
[ https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645819#comment-13645819 ] Scott Carey commented on AVRO-607: -- Alternative to this patch, we could synchronize the method, or we can use a ThreadLocalWeakHashMapType, Schema. A cache with a Type or Class key (that is not weak) that becomes static can lead to classloader leaks. In Avro code, a weak concurrent hash map is in high demand. SpecificData.getSchema not thread-safe -- Key: AVRO-607 URL: https://issues.apache.org/jira/browse/AVRO-607 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.3.3 Reporter: Stephen Tu Priority: Minor Attachments: AVRO-607.patch SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe, and the method itself is not synchronized. Seems like this could lead to the data structure getting corrupted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1044) avro-maven-plugin requires dependency resolution which breaks multi-module projects
[ https://issues.apache.org/jira/browse/AVRO-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645871#comment-13645871 ] Scott Carey commented on AVRO-1044: --- Is the problem you are having with the 'idl' mojo? It is the only mojo that requires dependency resolution, as it declares '@requiresDependencyResolution runtime' for some reason. That would explain why you have issues and I do not. I am not sure why this mojo requiers all runtime dependencies to be in scope. avro-maven-plugin requires dependency resolution which breaks multi-module projects --- Key: AVRO-1044 URL: https://issues.apache.org/jira/browse/AVRO-1044 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.2 Reporter: Arvind Prabhakar Priority: Critical Use of avro-maven-plugin breaks multimodule projects since it forces the dependency resolution of all of the dependencies, some of which may be from within the reactor and not yet installed in the local cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1044) avro-maven-plugin requires dependency resolution which breaks multi-module projects
[ https://issues.apache.org/jira/browse/AVRO-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645873#comment-13645873 ] Scott Carey commented on AVRO-1044: --- This is due to AVRO-971. This feature should be optional since it breaks builds. avro-maven-plugin requires dependency resolution which breaks multi-module projects --- Key: AVRO-1044 URL: https://issues.apache.org/jira/browse/AVRO-1044 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.6.2 Reporter: Arvind Prabhakar Priority: Critical Use of avro-maven-plugin breaks multimodule projects since it forces the dependency resolution of all of the dependencies, some of which may be from within the reactor and not yet installed in the local cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins
Scott Carey created AVRO-1314: - Summary: Java: Add @threadSafe annotation to maven plugins Key: AVRO-1314 URL: https://issues.apache.org/jira/browse/AVRO-1314 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Our plugins are thread-safe, mark them as much so that warnings will not be printed when running parallel maven builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins
[ https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1314: -- Attachment: AVRO-1313.patch Trivial patch that sets the Mojos to be tagged @threadSafe. Java: Add @threadSafe annotation to maven plugins - Key: AVRO-1314 URL: https://issues.apache.org/jira/browse/AVRO-1314 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Attachments: AVRO-1313.patch Our plugins are thread-safe, mark them as much so that warnings will not be printed when running parallel maven builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1314) Java: Add @threadSafe annotation to maven plugins
[ https://issues.apache.org/jira/browse/AVRO-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1314: -- Fix Version/s: 1.7.5 Status: Patch Available (was: Open) I'll commit this soon unless there are objections. Java: Add @threadSafe annotation to maven plugins - Key: AVRO-1314 URL: https://issues.apache.org/jira/browse/AVRO-1314 Project: Avro Issue Type: Bug Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch Our plugins are thread-safe, mark them as much so that warnings will not be printed when running parallel maven builds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Status: Patch Available (was: Open) Ready for review. Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Assignee: Scott Carey Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (AVRO-1315) Java: Schema Validation utilities
Scott Carey created AVRO-1315: - Summary: Java: Schema Validation utilities Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities
[ https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645908#comment-13645908 ] Scott Carey commented on AVRO-1315: --- This incorporates the following: * An additional public method on Symbol.java to detect whether a Symbol tree has an error in it. In the case of a Symbol returned by schema resolution, this indicates that the schemas are not compatible. * An interface o.a.a.SchemaValidator that checks a schema against an Iterable of other schemas. The notion of compatibility is left to the implementation. Schemas in the Iterable are returned from most recent to oldest, if chronological order is applicable. * An interface o.a.a.SchemaValidationStrategy that validates one schema against another. The notion of compatibility is left to the implementation. * A concrete SchemaValidator -- ValidateAll. This takes a SchemaValidationStrategy as a constructor parameter and when its validate() method is called, uses the strategy for each item in the Iterable, in order. * A concrete SchemaValidator -- ValidateLatest. This takes a SchemaValidationStrategy as a constructor parameter and when its validate() method is called, uses the strategy for only the first item in the Iterable. * A SchemaValidationBuilder for constructing SchemaValidators, with private implementations of SchemaValidationStrategy that can be configured: ** Validate that the schema can read all others. ** Validate that all others can read the schema. ** Validate that the schema and all others are mutually compatible (can read each other). ** Validate that the schema can read the latest. ** Validate that the latest can read the schema. ** Validate that the latest and the schema are mutually compatible. A few questions for discussion: I am tempted to hide the concrete implementations and not expose publicly. In the forthcoming patch, all are hidden so that only the interfaces and builder are public and need to be supported public APIs. Alternatively we can expose some of the SchemaValidator and SchemaValidationStrategy implementations as public. I am tempted to start private and see how this implementation works out. Java: Schema Validation utilities - Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1315) Java: Schema Validation utilities
[ https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646108#comment-13646108 ] Scott Carey commented on AVRO-1315: --- Another option is to only have SchemaValidationStrategy -- the pair-wise validation -- in org.apache.avro. The Validation over a list of 'previous' schemas can stay in the schema-repo being developed in AVRO-1124. This would reduce what code is in the core avro package but might be more universally usable. Java: Schema Validation utilities - Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1315) Java: Schema Validation utilities
[ https://issues.apache.org/jira/browse/AVRO-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1315: -- Attachment: AVRO-1315.patch Patch implementing the design discussed above. Java: Schema Validation utilities - Key: AVRO-1315 URL: https://issues.apache.org/jira/browse/AVRO-1315 Project: Avro Issue Type: New Feature Components: java Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1315.patch As part of AVRO-1124 we needed Schema Validation utilities. I have separated those out of that ticket as a stand-alone item. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1313: -- Attachment: AVRO-1313-v2.patch Even better, the check at runtime when loading should check that it will work. This changes the runtime test to have fields of all types and validates that it works with all of them before loading an implementation. Both Unsafe and Reflect cases of these code paths are covered in the Unit tests. Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (AVRO-1313) Java: Add system property for disabling sun.misc.Unsafe
[ https://issues.apache.org/jira/browse/AVRO-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646145#comment-13646145 ] Scott Carey edited comment on AVRO-1313 at 4/30/13 11:21 PM: - Even better, the check at runtime when loading should check that it will work. This patch (-v2) changes the runtime test to have fields of all types and validates that it works with all of them before loading an implementation. Both Unsafe and Reflect cases of these code paths are covered in the Unit tests. was (Author: scott_carey): Even better, the check at runtime when loading should check that it will work. This changes the runtime test to have fields of all types and validates that it works with all of them before loading an implementation. Both Unsafe and Reflect cases of these code paths are covered in the Unit tests. Java: Add system property for disabling sun.misc.Unsafe --- Key: AVRO-1313 URL: https://issues.apache.org/jira/browse/AVRO-1313 Project: Avro Issue Type: Improvement Reporter: Scott Carey Assignee: Scott Carey Fix For: 1.7.5 Attachments: AVRO-1313.patch, AVRO-1313-v2.patch We should be able to disable use of sun.misc.Unsafe. I propose that if the system property avro.disable.unsafe is non-null, we use reflection rather than Unsafe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1124) RESTful service for holding schemas
[ https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646203#comment-13646203 ] Scott Carey commented on AVRO-1124: --- * schema metadata: There is one race condition to consider -- currently subject.register(foo) is idempotent and also never fails unless there is a schema validation failure. Two users simultaneously registering the same schema end up with the same schema/id pair -- both fail or both succeed and get the same result. If we tag metadata along with it, then two concurrent registrations with the same schema but different metadata might occur. The actions are still idempotent and the two users get the same result, but only one will have the metadata expected set. I will still have register() never fail outside of validation, but the schema metadata is not guaranteed to be what the user requested when there is a race condition -- the same thing happens with subject creation now. If metadata is immutable, it can be cached and part of the SchemaEntry. If it is not, it will need to be uncached or have a TTL, the latter I would like to avoid due to complexity. * In a subject, schema/id pairs are only added. The caching layer is free to assume that once an id/schema relation exists, it will forever, there is no propagation of updates. This is the sane thing to do -- once a datum has been written with an id, the schema tied to that key should be kept forever. If a schema could be removed, we would need to check the repository for every record or have a TTL in the cache. It would be easier to support 'deactivating' a schema/id pair so that it is not returned when scanning all the active schemas in a subject, or with validation, but can still be found by looking it up. Can you describe the use case for deleting a schema? Under what conditions would you want to do so? * I have opened https://issues.apache.org/jira/browse/AVRO-1315 to cover the avro schema validation components that live outside of the repo projects. Please provide feedback, Thanks! RESTful service for holding schemas --- Key: AVRO-1124 URL: https://issues.apache.org/jira/browse/AVRO-1124 Project: Avro Issue Type: New Feature Reporter: Jay Kreps Assignee: Jay Kreps Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, AVRO-1124.patch, AVRO-1124.patch, AVRO-1124-validators-preliminary.patch Motivation: It is nice to be able to pass around data in serialized form but still know the exact schema that was used to serialize it. The overhead of storing the schema with each record is too high unless the individual records are very large. There are workarounds for some common cases: in the case of files a schema can be stored once with a file of many records amortizing the per-record cost, and in the case of RPC the schema can be negotiated ahead of time and used for many requests. For other uses, though it is nice to be able to pass a reference to a given schema using a small id and allow this to be looked up. Since only a small number of schemas are likely to be active for a given data source, these can easily be cached, so the number of remote lookups is very small (one per active schema version). Basically this would consist of two things: 1. A simple REST service that stores and retrieves schemas 2. Some helper java code for fetching and caching schemas for people using the registry We have used something like this at LinkedIn for a few years now, and it would be nice to standardize this facility to be able to build up common tooling around it. This proposal will be based on what we have, but we can change it as ideas come up. The facilities this provides are super simple, basically you can register a schema which gives back a unique id for it or you can query for a schema. There is almost no code, and nothing very complex. The contract is that before emitting/storing a record you must first publish its schema to the registry or know that it has already been published (by checking your cache of published schemas). When reading you check your cache and if you don't find the id/schema pair there you query the registry to look it up. I will explain some of the nuances in more detail below. An added benefit of such a repository is that it makes a few other things possible: 1. A graphical browser of the various data types that are currently used and all their previous forms. 2. Automatic enforcement of compatibility rules. Data is always compatible in the sense that the reader will always deserialize it (since they are using the same schema as the writer) but this does not mean it is compatible with the expectations of the reader. For example if an int field is
[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch
[ https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644651#comment-13644651 ] Scott Carey commented on AVRO-1310: --- I purged my repo and tried 'mvn install' from trunk, and that worked fine. I had one modification for mac in the lang/java pom.xml due to snappy-java issues on mac (https://github.com/ptaoussanis/carmine/issues/5) to update to version 1.0.5-M4. What three tests fail in step 1? Test failures in avro compiler will prevent the remainder from building. What happens if you skip tests: 'mvn clean install -DskipTests' from trunk? Avro Maven project can't be built from scratch -- Key: AVRO-1310 URL: https://issues.apache.org/jira/browse/AVRO-1310 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Environment: Maven on Eclipse Reporter: Nir Zamir When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn install') there are errors. Most of the errors are in tests so I tried skipping the tests but it still fails. See more details in my post on Avro Users: http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1310) Avro Maven project can't be built from scratch
[ https://issues.apache.org/jira/browse/AVRO-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644661#comment-13644661 ] Scott Carey commented on AVRO-1310: --- The end result you are having: Could not find artifact org.apache.avro:avro-ipc:jar:tests:1.7.5-SNAPSHOT Is due to either: * not having the avro-ipc test jar deployed in your local repo after being built form an install that includes the test phase of avro-ipc. * not having it available in the maven reactor as part of the build. What is the output of 'mvn --version' ? Mine is: {noformat} Apache Maven 3.0.3 (r1075438; 2011-02-28 09:31:09-0800) Maven home: /usr/share/maven Java version: 1.7.0_13, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.7.0_13.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: mac os x, version: 10.7.5, arch: x86_64, family: mac {noformat} Avro Maven project can't be built from scratch -- Key: AVRO-1310 URL: https://issues.apache.org/jira/browse/AVRO-1310 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.4 Environment: Maven on Eclipse Reporter: Nir Zamir When getting the Java 'trunk' from SVN and trying to use Maven Install ('mvn install') there are errors. Most of the errors are in tests so I tried skipping the tests but it still fails. See more details in my post on Avro Users: http://apache-avro.679487.n3.nabble.com/help-with-Avro-compilation-td4026946.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1282: -- Attachment: AVRO-1282-s6.patch This patch (AVRO-1282-s6.patch) includes the following: * The Reflect API now uses sun.misc.Unsafe to do Field reflection, for significantly improved performance. (Leo) * The Reflect API avoids boxing for reading and writing primitive arrays. (Leo, Scott) * Resolution of whether it is safe to use Unsafe is done statically, only one implementation is loaded. The Unsafe implementation is tested to ensure that all features function properly at load time (for example, to handle Android or other JVMs with partial Unsafe support). A unit test is added that uses a classloader that fails to load Unsafe to test this. (Leo, Scott) * EncoderFactory is fixed to properly configure the BlockingBinaryEncoder. (Scott) * Reflection now supports reading the Blocked encoding into native arrays. (Scott) * A dozen new tests added to Perf.java to cover the ReflectDatum{Reader,Writer}, including with the blocked binary encoding. (Leo, Scott) * Additional unit tests in TestReflect to cover encoding and decoding primitive arrays, including with blocked binary encoding. (Scott) Performance on Reflect Perf.java tests is approximately 2.5x to 29x faster, usually between 3x and 5x faster. Before: {noformat} test name timeM entries/sec M bytes/sec bytes/cycle ReflectRecordRead: 15067 ms 1.10642.927 808498 ReflectRecordWrite: 10903 ms 1.52959.319 808498 ReflectBigRecordRead: 19285 ms 0.51931.832 767380 ReflectBigRecordWrite: 13958 ms 0.71643.979 767380 ReflectFloatRead: 29594 ms 0.00027.032 104 ReflectFloatWrite: 33327 ms 0.00024.004 104 ReflectDoubleRead: 27442 ms 0.00058.303 204 ReflectDoubleWrite: 31529 ms 0.00050.746 204 ReflectIntArrayRead: 38088 ms 0.43818.057 859709 ReflectIntArrayWrite: 23342 ms 0.71429.464 859709 ReflectLongArrayRead: 18476 ms 0.45134.869 805344 ReflectLongArrayWrite: 12715 ms 0.65550.667 805344 ReflectDoubleArrayRead: 19411 ms 0.51533.718 818144 ReflectDoubleArrayWrite: 13825 ms 0.72347.340 818144 ReflectFloatArrayRead: 39502 ms 0.50617.137 846172 ReflectFloatArrayWrite: 27492 ms 0.72724.623 846172 ReflectNestedFloatArrayRead: 41225 ms 0.48516.420 846172 ReflectNestedFloatArrayWrite: 30229 ms 0.66222.393 846172 ReflectNestedObjectArrayRead: 31679 ms 0.12616.291 645104 ReflectNestedObjectArrayWrite: 17206 ms 0.23229.994 645104 ReflectNestedLargeFloatArrayRead: 33099 ms 0.10126.282 1087381 ReflectNestedLargeFloatArrayWrite: 35159 ms 0.09524.742 1087381 ReflectNestedLargeFloatArrayBlockedRead: 33326 ms 0.10026.302 1095674 ReflectNestedLargeFloatArrayBlockedWrite: 36921 ms 0.09023.741 1095674 {noformat} After: {noformat} test name timeM entries/sec M bytes/sec bytes/cycle ReflectRecordRead: 6058 ms 2.751 106.754 808498 ReflectRecordWrite: 3750 ms 4.444 172.470 808498 ReflectBigRecordRead: 6767 ms 1.47890.709 767380 ReflectBigRecordWrite: 4433 ms 2.255 138.466 767380 ReflectFloatRead: 6155 ms 0.000 129.970 104 ReflectFloatWrite: 1083 ms 0.001 738.434 104 ReflectDoubleRead: 6610 ms 0.000 242.028 204 ReflectDoubleWrite: 1968 ms 0.000 812.864 204 ReflectIntArrayRead: 9462 ms 1.76172.683 859709 ReflectIntArrayWrite: 2468 ms 6.751 278.584 859709 ReflectLongArrayRead: 5556 ms 1.500 115.941 805344
[jira] [Created] (AVRO-1311) Upgrade Snappy-Java dependency to support building on Mac + Java 7
Scott Carey created AVRO-1311: - Summary: Upgrade Snappy-Java dependency to support building on Mac + Java 7 Key: AVRO-1311 URL: https://issues.apache.org/jira/browse/AVRO-1311 Project: Avro Issue Type: Bug Affects Versions: 1.7.4 Reporter: Scott Carey Assignee: Scott Carey snappy-java 1.0.4 does not work with Mac + Java 7. 1.0.5-M4 is on maven, but it does not appear that there will be a final release of that. 1.1.0 is at -M3 status, and is being developed now. Both of these work locally for me, when the dust settles we need to pick one before the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645054#comment-13645054 ] Scott Carey commented on AVRO-1282: --- I think I simply missed that. I changed it at some point in the process and did not revert. I'll change it back to use INSTANCE with get(). There are a lot of static caches here, and the ones that have Class objects in them without weak references are prone to trigger classloader leaking. We can fix that elsewhere. Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, avro-1282-v1.patch, avro-1282-v2.patch, avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, AVRO-1282-v9.patch, TestUnsafeUtil.java Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated AVRO-1282: -- Attachment: AVRO-1282-s7.patch Minor difference to the -s7 patch: Returns INSTANCE to ReflectData, as AVRO-1283 is the ticket for such changes. Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, AVRO-1282-s3.patch, AVRO-1282-s5.patch, AVRO-1282-s6.patch, AVRO-1282-s7.patch, avro-1282-v1.patch, avro-1282-v2.patch, avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, AVRO-1282-v9.patch, TestUnsafeUtil.java Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (AVRO-1282) Make use of the sun.misc.Unsafe class during serialization if a JDK supports it
[ https://issues.apache.org/jira/browse/AVRO-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13643923#comment-13643923 ] Scott Carey commented on AVRO-1282: --- Leo: Using a variation of your GenericRecordAccessor was the plan, but after a little more work I'm not sure it will go much faster. Regarding caching by schema -- it caches first by class, then for each class it has both a lookup of Accessors by field name, and one by schema. The looup by schema returns a FieldAccessor[], the one by name returns the FieldAccessor for that specific named field. All of the main remaining performance issues are now similar to Generic and Specific code: We are traversing and inspecting objects and other data structures on the fly far too often -- e.g. Schema, Parser, and various instanceof checks, and most of it could be pre computed if the code was structured radically differently. That will have to wait for another time. I've got some old prototypes around elsewhere for Generic/Specific, and after digging in this deep in Reflect for the first time I'm convinced they all share the same fundamental performance barriers now. Of course there is room for some tweaks here and there, but for major wins we need to make bigger changes. I am finalizing a patch that gets performance back up to the level you had it or better -- a little faster in most cases and a little slower in others. I've also rearranged and streamlined Perf.java, incorporating your more recent versions and refactoring to share more code and make it simpler. There is still one fundamental flaw: reading blocked encoding is not supported (it would trigger an array bounds check exception). I have isolated the code that loops and writes arrays, so we can add that more easily from here. Make use of the sun.misc.Unsafe class during serialization if a JDK supports it --- Key: AVRO-1282 URL: https://issues.apache.org/jira/browse/AVRO-1282 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.4 Reporter: Leo Romanoff Priority: Minor Attachments: AVRO-1282-s1.patch, AVRO-1282-s2.patch, AVRO-1282-s3.patch, avro-1282-v1.patch, avro-1282-v2.patch, avro-1282-v3.patch, avro-1282-v4.patch, avro-1282-v5.patch, avro-1282-v6.patch, avro-1282-v7.patch, avro-1282-v8.patch, TestUnsafeUtil.java Unsafe can be used to significantly speed up serialization process, if a JDK implementation supports java.misc.Unsafe properly. Most JDKs running on PCs support it. Some platforms like Android lack a proper support for Unsafe yet. There are two possibilities to use Unsafe for serialization: 1) Very quick access to the fields of objects. It is way faster than with the reflection-based approach using Field.get/set 2) Input and Output streams can be using Unsafe to perform very quick input/output. 3) More over, Unsafe makes it possible to serialize to/deserialize from off-heap memory directly and very quickly, without any intermediate buffers allocated on heap. There is virtually no overhead compared to the usual byte arrays. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira