[GitHub] [avro] paravoid opened a new pull request #1091: AVRO-1846: fix memory alignment issues (performance degredation & crashes)
paravoid opened a new pull request #1091: URL: https://github.com/apache/avro/pull/1091 This fixes some unaligned memory access issues, which cause an immediate crash with SIGBUS on certain architectures (e.g. sparcv9, as the Jira ticket indicates), as well as performance degredation in most others. This effectively fixes two issues with the existing code: - The refcount was kept in a (32-bit) `int`, and then access to the data is shifted by 32-bits as well; the commit included changes this to a `long`, which is equal to `sizeof(void *)` on all Unices (and e.g. 32-bit on x86, but 64-bit on x86_64 and sparcv9). - Record fields are accessed using an offset (`field_offset`), which was set to the previous field's size. If the previous field was e.g. an `int` (i.e. int32_t), or a boolean (int) etc., then access past it would be unaligned (i.e. `int` followed by a `long` => `int` at offset 0, `long` at offset 4). This commit rounds up the "field size" to a multiple of pointer size. In C terms, Avro kept records in memory as a packed structure, while this commit modifies this to add paddings so that fields always start on the right boundary. This has been tested (built & ran tests, including memcheck/Valgrind) on both x86_64 (amd64) and sparcv9 (sparc64) on Linux/GCC. I have unfortunately not tested it with any real world code, or benchmarked the performance benefits on popular 64-bit architectures (like x86_64 or aarch64). ### Jira - [x] My PR addresses [AVRO-1846](https://issues.apache.org/jira/browse/AVRO-1846) ### Tests - [x] My PR does not need testing for this extremely good reason: all code covered by existing tests ### Commits - [x] My commits all reference Jira issues in their subject lines, with the exception the first one, which is a tiny define rename not associated with any issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [avro] Fokko opened a new pull request #1090: AVRO-3043: Remove redundant casts
Fokko opened a new pull request #1090: URL: https://github.com/apache/avro/pull/1090 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR" - https://issues.apache.org/jira/browse/AVRO-3043 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (AVRO-3043) Remove redundant generic casts
Fokko Driesprong created AVRO-3043: -- Summary: Remove redundant generic casts Key: AVRO-3043 URL: https://issues.apache.org/jira/browse/AVRO-3043 Project: Apache Avro Issue Type: Improvement Reporter: Fokko Driesprong Assignee: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (AVRO-3042) Make MODEL$ final
Fokko Driesprong created AVRO-3042: -- Summary: Make MODEL$ final Key: AVRO-3042 URL: https://issues.apache.org/jira/browse/AVRO-3042 Project: Apache Avro Issue Type: Improvement Reporter: Fokko Driesprong Assignee: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [avro] Fokko opened a new pull request #1089: AVRO-3042: Make MODEL$ final
Fokko opened a new pull request #1089: URL: https://github.com/apache/avro/pull/1089 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR" - https://issues.apache.org/jira/browse/AVRO-3042 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (AVRO-3041) Reflect: Default converters are not tried for nullable unions
[ https://issues.apache.org/jira/browse/AVRO-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283355#comment-17283355 ] John Gonyo commented on AVRO-3041: -- See solution in https://github.com/apache/avro/pull/1088 > Reflect: Default converters are not tried for nullable unions > - > > Key: AVRO-3041 > URL: https://issues.apache.org/jira/browse/AVRO-3041 > Project: Apache Avro > Issue Type: Bug > Components: csharp >Reporter: John Gonyo >Priority: Major > > Default converters can be added for a given Avro type and a Property type. > For example, to map a long to a DateTimeOffset. However, if the class > property is nullable, the default converter is not used. > > Avro reflection should find the converter to use for nullable primitives. It > can also be helpful and provide automatic translation from non-nullable > converters to nullable ones. This allows a client to create one non-nullable > converter that can be used for nullable properties. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [avro] Gonyoda opened a new pull request #1088: AVRO-3041 : use default converters for unions/nullable properties
Gonyoda opened a new pull request #1088: URL: https://github.com/apache/avro/pull/1088 ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3041) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AVRO-3041 ### Tests - [ ] My PR adds the following unit tests - SerializeNullableWithNullableConverter - SerializeNullableDefaultWithClass - SerializeNullableDefaultWithDelegate - SerializeNullableDefaultWithNullableDelegate - SerializeNullableDefaultWithDoubleNullableDelegate ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (AVRO-3041) Reflect: Default converters are not tried for nullable unions
John Gonyo created AVRO-3041: Summary: Reflect: Default converters are not tried for nullable unions Key: AVRO-3041 URL: https://issues.apache.org/jira/browse/AVRO-3041 Project: Apache Avro Issue Type: Bug Components: csharp Reporter: John Gonyo Default converters can be added for a given Avro type and a Property type. For example, to map a long to a DateTimeOffset. However, if the class property is nullable, the default converter is not used. Avro reflection should find the converter to use for nullable primitives. It can also be helpful and provide automatic translation from non-nullable converters to nullable ones. This allows a client to create one non-nullable converter that can be used for nullable properties. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2838) Schema in generated Java class is different than the original one
[ https://issues.apache.org/jira/browse/AVRO-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283329#comment-17283329 ] William Daniels commented on AVRO-2838: --- +1 to this issue, we are seeing this with our environment. (left a comment with slightly more detail on [https://github.com/confluentinc/schema-registry/issues/1677] ). You are forced into one of a few options, realistically: * Your users create all schemas with `avro.java.string` instead of `string`, bad for multi-language support and portability * You use `CharSequence` for everything, bad due to having to null check everywhere, and charSequence can't be a key in a map * You force the usage of 'user.latest.version' (see GH issue above details of why that's not advised) * You introduce custom code during schema registration time to turn `string` into a `avro.java.string` to obscure the issue from the end user, bad due to potential bugs in pattern matching, and the schema the users crafted don't match what actually gets uploaded.. which is very confusing * You don't use code generation at all and a custom serde implementation (a la Ryan Skraba's approach) * you wrap the usage of the generated code somehow and obscure the `CharSequence` to the end user, doing your own null checking, and turning it into a string on the api surface. * etc There are probably other problems and various workarounds, but in the end, it seems reasonable to say that the generated schema used should match the schema it was generated from, especially with regards to primitive types, but in general also. > Schema in generated Java class is different than the original one > - > > Key: AVRO-2838 > URL: https://issues.apache.org/jira/browse/AVRO-2838 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Lukas Krecan >Priority: Major > Attachments: AVRO.patch > > > If you generate Java classes from schema, {{SCHEMA$}} variable differs from > the original schema. It causes issues like > [this|https://github.com/confluentinc/schema-registry/issues/868] and > [this|https://github.com/confluentinc/schema-registry/issues/1352] when using > Schema registry. > The issue happens when the schema in the registry is configured externally > and then you try to use generated Java class. The schema in the registry does > not match the schema in the class and thus the write is refused. > Technically it's easy to fix (see the attached patch) but I guess there will > be some backward compatibility concerns. > If it is not acceptable, would it be at least possible to add the > {{originalSchema}} context variable so we could solve it using custom > template. > The patch is not final, its purpose is just to convey the idea. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-3040) GenericWriter: Array requirement too restrictive
[ https://issues.apache.org/jira/browse/AVRO-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283249#comment-17283249 ] John Gonyo commented on AVRO-3040: -- See [https://github.com/apache/avro/pull/1087] for a solution > GenericWriter: Array requirement too restrictive > > > Key: AVRO-3040 > URL: https://issues.apache.org/jira/browse/AVRO-3040 > Project: Apache Avro > Issue Type: Improvement > Components: csharp >Reporter: John Gonyo >Priority: Major > > GenericWriter expects schema array properties to be an actual Array type, > while Reflect and Specific writers expect array fields to be of IList. > For convenience, it would nice to support List, IList or Array types for > array schema fields. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [avro] Gonyoda opened a new pull request #1087: AVRO-3040: support IList in generic reader/writer
Gonyoda opened a new pull request #1087: URL: https://github.com/apache/avro/pull/1087 ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3040) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR" - https://issues.apache.org/jira/browse/AVRO-3040 ### Tests - [ ] My PR adds the following unit tests - [ ] TestRecordAllow_List - [ ] TestRecordAllow_IList ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (AVRO-3040) GenericWriter: Array requirement too restrictive
John Gonyo created AVRO-3040: Summary: GenericWriter: Array requirement too restrictive Key: AVRO-3040 URL: https://issues.apache.org/jira/browse/AVRO-3040 Project: Apache Avro Issue Type: Improvement Components: csharp Reporter: John Gonyo GenericWriter expects schema array properties to be an actual Array type, while Reflect and Specific writers expect array fields to be of IList. For convenience, it would nice to support List, IList or Array types for array schema fields. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-3039) ClassCache: Cached class map key is too broad
[ https://issues.apache.org/jira/browse/AVRO-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283119#comment-17283119 ] John Gonyo commented on AVRO-3039: -- See [https://github.com/apache/avro/pull/1086] for fix > ClassCache: Cached class map key is too broad > - > > Key: AVRO-3039 > URL: https://issues.apache.org/jira/browse/AVRO-3039 > Project: Apache Avro > Issue Type: Bug > Components: csharp >Reporter: John Gonyo >Priority: Major > > Currently ClassMap's cache is keyed by the schema full name. This restricts > clients to providing one Plain Old C# Object (POCO) model per schema. In > some cases a client might wish to provide multiple POCO per schema name, > perhaps to support multiple versions of the same schema name in the same > runtime. > A fix is to use the full RecordSchema as the key instead of just the name. > Additionally, there's a bug in ReflectDefaultReader that can access the > incorrect class member property. Only the reader schema should be used to > determine the correct DotNetClass for reading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [avro] Gonyoda opened a new pull request #1086: AVRO-3039 : support multiple versions of the same schema name
Gonyoda opened a new pull request #1086: URL: https://github.com/apache/avro/pull/1086 ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3039) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR" - https://issues.apache.org/jira/browse/AVRO-3039 ### Tests - [ ] My PR adds the following unit tests - [ ] TestMultiModelPerSchemaName ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (AVRO-3039) ClassCache: Cached class map key is too broad
John Gonyo created AVRO-3039: Summary: ClassCache: Cached class map key is too broad Key: AVRO-3039 URL: https://issues.apache.org/jira/browse/AVRO-3039 Project: Apache Avro Issue Type: Bug Components: csharp Reporter: John Gonyo Currently ClassMap's cache is keyed by the schema full name. This restricts clients to providing one Plain Old C# Object (POCO) model per schema. In some cases a client might wish to provide multiple POCO per schema name, perhaps to support multiple versions of the same schema name in the same runtime. A fix is to use the full RecordSchema as the key instead of just the name. Additionally, there's a bug in ReflectDefaultReader that can access the incorrect class member property. Only the reader schema should be used to determine the correct DotNetClass for reading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-3038) ClassCache: Nullable primitive types and arrays not registered correctly to the class map
[ https://issues.apache.org/jira/browse/AVRO-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283090#comment-17283090 ] John Gonyo commented on AVRO-3038: -- See [https://github.com/apache/avro/pull/1085] for fix > ClassCache: Nullable primitive types and arrays not registered correctly to > the class map > - > > Key: AVRO-3038 > URL: https://issues.apache.org/jira/browse/AVRO-3038 > Project: Apache Avro > Issue Type: Bug > Components: csharp >Affects Versions: 1.10.1 >Reporter: John Gonyo >Priority: Major > > When a schema contains a union of a nullable enum, or an array, the > corresponding NamedSchema or array type are not registered. > > ClassCache's Union can be updated to check for and correctly register these > nullable types. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [avro] Gonyoda opened a new pull request #1085: AVRO-3038: enable support for nullable array/enum
Gonyoda opened a new pull request #1085: URL: https://github.com/apache/avro/pull/1085 ### Jira - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3038) issues and references them in the PR title. - https://issues.apache.org/jira/browse/AVRO-3038 ### Tests - [ ] My PR adds the following unit tests: - [ ] NullableArrayTest - [ ] NullableEnumTest ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (AVRO-3038) ClassCache: Nullable primitive types and arrays not registered correctly to the class map
John Gonyo created AVRO-3038: Summary: ClassCache: Nullable primitive types and arrays not registered correctly to the class map Key: AVRO-3038 URL: https://issues.apache.org/jira/browse/AVRO-3038 Project: Apache Avro Issue Type: Bug Components: csharp Affects Versions: 1.10.1 Reporter: John Gonyo When a schema contains a union of a nullable enum, or an array, the corresponding NamedSchema or array type are not registered. ClassCache's Union can be updated to check for and correctly register these nullable types. -- This message was sent by Atlassian Jira (v8.3.4#803005)