[GitHub] [avro] paravoid opened a new pull request #1091: AVRO-1846: fix memory alignment issues (performance degredation & crashes)

2021-02-11 Thread GitBox


paravoid opened a new pull request #1091:
URL: https://github.com/apache/avro/pull/1091


   This fixes some unaligned memory access issues, which cause an immediate 
crash with SIGBUS on certain architectures (e.g. sparcv9, as the Jira ticket 
indicates), as well as performance degredation in most others. This effectively 
fixes two issues with the existing code:
   - The refcount was kept in a (32-bit) `int`, and then access to the data is 
shifted by 32-bits as well; the commit included changes this to a `long`, which 
is equal to `sizeof(void *)` on all Unices (and e.g. 32-bit on x86, but 64-bit 
on x86_64 and sparcv9).
   - Record fields are accessed using an offset (`field_offset`), which was set 
to the previous field's size. If the previous field was e.g. an `int` (i.e. 
int32_t), or a boolean (int) etc., then access past it would be unaligned (i.e. 
`int` followed by a `long` => `int` at offset 0, `long` at offset 4). This 
commit rounds up the "field size" to a multiple of pointer size. In C terms, 
Avro kept records in memory as a packed structure, while this commit modifies 
this to add paddings so that fields always start on the right boundary.
   
   This has been tested (built & ran tests, including memcheck/Valgrind) on 
both x86_64 (amd64) and sparcv9 (sparc64) on Linux/GCC. I have unfortunately 
not tested it with any real world code, or benchmarked the performance benefits 
on popular 64-bit architectures (like x86_64 or aarch64).
   
   ### Jira
   
   - [x] My PR addresses 
[AVRO-1846](https://issues.apache.org/jira/browse/AVRO-1846)
   
   ### Tests
   
   - [x] My PR does not need testing for this extremely good reason: all code 
covered by existing tests
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, with the 
exception the first one, which is a tiny define rename not associated with any 
issue.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [avro] Fokko opened a new pull request #1090: AVRO-3043: Remove redundant casts

2021-02-11 Thread GitBox


Fokko opened a new pull request #1090:
URL: https://github.com/apache/avro/pull/1090


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them 
in the PR title. For example, "AVRO-1234: My Avro PR"
 - https://issues.apache.org/jira/browse/AVRO-3043
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (AVRO-3043) Remove redundant generic casts

2021-02-11 Thread Fokko Driesprong (Jira)
Fokko Driesprong created AVRO-3043:
--

 Summary: Remove redundant generic casts
 Key: AVRO-3043
 URL: https://issues.apache.org/jira/browse/AVRO-3043
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-3042) Make MODEL$ final

2021-02-11 Thread Fokko Driesprong (Jira)
Fokko Driesprong created AVRO-3042:
--

 Summary: Make MODEL$ final
 Key: AVRO-3042
 URL: https://issues.apache.org/jira/browse/AVRO-3042
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [avro] Fokko opened a new pull request #1089: AVRO-3042: Make MODEL$ final

2021-02-11 Thread GitBox


Fokko opened a new pull request #1089:
URL: https://github.com/apache/avro/pull/1089


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them 
in the PR title. For example, "AVRO-1234: My Avro PR"
 - https://issues.apache.org/jira/browse/AVRO-3042
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (AVRO-3041) Reflect: Default converters are not tried for nullable unions

2021-02-11 Thread John Gonyo (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283355#comment-17283355
 ] 

John Gonyo commented on AVRO-3041:
--

See solution in https://github.com/apache/avro/pull/1088

> Reflect: Default converters are not tried for nullable unions
> -
>
> Key: AVRO-3041
> URL: https://issues.apache.org/jira/browse/AVRO-3041
> Project: Apache Avro
>  Issue Type: Bug
>  Components: csharp
>Reporter: John Gonyo
>Priority: Major
>
> Default converters can be added for a given Avro type and a Property type.  
> For example, to map a long to a DateTimeOffset.  However, if the class 
> property is nullable, the default converter is not used.
>  
> Avro reflection should find the converter to use for nullable primitives.  It 
> can also be helpful and provide automatic translation from non-nullable 
> converters to nullable ones.  This allows a client to create one non-nullable 
> converter that can be used for nullable properties.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [avro] Gonyoda opened a new pull request #1088: AVRO-3041 : use default converters for unions/nullable properties

2021-02-11 Thread GitBox


Gonyoda opened a new pull request #1088:
URL: https://github.com/apache/avro/pull/1088


   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3041) issues and 
references them in the PR title.
 - https://issues.apache.org/jira/browse/AVRO-3041
   
   ### Tests
   
   - [ ] My PR adds the following unit tests
   - SerializeNullableWithNullableConverter
   - SerializeNullableDefaultWithClass
   - SerializeNullableDefaultWithDelegate
   - SerializeNullableDefaultWithNullableDelegate
   - SerializeNullableDefaultWithDoubleNullableDelegate
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (AVRO-3041) Reflect: Default converters are not tried for nullable unions

2021-02-11 Thread John Gonyo (Jira)
John Gonyo created AVRO-3041:


 Summary: Reflect: Default converters are not tried for nullable 
unions
 Key: AVRO-3041
 URL: https://issues.apache.org/jira/browse/AVRO-3041
 Project: Apache Avro
  Issue Type: Bug
  Components: csharp
Reporter: John Gonyo


Default converters can be added for a given Avro type and a Property type.  For 
example, to map a long to a DateTimeOffset.  However, if the class property is 
nullable, the default converter is not used.

 

Avro reflection should find the converter to use for nullable primitives.  It 
can also be helpful and provide automatic translation from non-nullable 
converters to nullable ones.  This allows a client to create one non-nullable 
converter that can be used for nullable properties.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2838) Schema in generated Java class is different than the original one

2021-02-11 Thread William Daniels (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283329#comment-17283329
 ] 

William Daniels commented on AVRO-2838:
---

+1 to this issue, we are seeing this with our environment. (left a comment with 
slightly more detail on 
[https://github.com/confluentinc/schema-registry/issues/1677] ). You are forced 
into one of a few options, realistically:

* Your users create all schemas with `avro.java.string` instead of `string`, 
bad for multi-language support and portability
* You use `CharSequence` for everything, bad due to having to null check 
everywhere, and charSequence can't be a key in a map
* You force the usage of 'user.latest.version' (see GH issue above details of 
why that's not advised)
* You introduce custom code during schema registration time to turn `string` 
into a `avro.java.string` to obscure the issue from the end user, bad due to 
potential bugs in pattern matching, and the schema the users crafted don't 
match what actually gets uploaded.. which is very confusing
* You don't use code generation at all and a custom serde implementation (a la 
Ryan Skraba's approach)
* you wrap the usage of the generated code somehow and obscure the 
`CharSequence` to the end user, doing your own null checking, and turning it 
into a string on the api surface. 
* etc

There are probably other problems and various workarounds, but in the end, it 
seems reasonable to say that the generated schema used should match the schema 
it was generated from, especially with regards to primitive types, but in 
general also.

> Schema in generated Java class is different than the original one
> -
>
> Key: AVRO-2838
> URL: https://issues.apache.org/jira/browse/AVRO-2838
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Reporter: Lukas Krecan
>Priority: Major
> Attachments: AVRO.patch
>
>
> If you generate Java classes from schema, {{SCHEMA$}} variable differs from 
> the original schema. It causes issues like 
> [this|https://github.com/confluentinc/schema-registry/issues/868] and 
> [this|https://github.com/confluentinc/schema-registry/issues/1352] when using 
> Schema registry.
> The issue happens when the schema in the registry is configured externally 
> and then you try to use generated Java class. The schema in the registry does 
> not match the schema in the class and thus the write is refused.
> Technically it's easy to fix (see the attached patch) but I guess there will 
> be some backward compatibility concerns. 
> If it is not acceptable, would it be at least possible to add the 
> {{originalSchema}} context variable so we could solve it using custom 
> template.
> The patch is not final, its purpose is just to convey the idea.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-3040) GenericWriter: Array requirement too restrictive

2021-02-11 Thread John Gonyo (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283249#comment-17283249
 ] 

John Gonyo commented on AVRO-3040:
--

See [https://github.com/apache/avro/pull/1087] for a solution

 

> GenericWriter: Array requirement too restrictive
> 
>
> Key: AVRO-3040
> URL: https://issues.apache.org/jira/browse/AVRO-3040
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: csharp
>Reporter: John Gonyo
>Priority: Major
>
> GenericWriter expects schema array properties to be an actual Array type, 
> while Reflect and Specific writers expect array fields to be of IList. 
> For convenience, it would nice to support List, IList or Array types for 
> array schema fields.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [avro] Gonyoda opened a new pull request #1087: AVRO-3040: support IList in generic reader/writer

2021-02-11 Thread GitBox


Gonyoda opened a new pull request #1087:
URL: https://github.com/apache/avro/pull/1087


   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3040) issues and 
references them in the PR title. For example, "AVRO-1234: My Avro PR"
 - https://issues.apache.org/jira/browse/AVRO-3040
   
   ### Tests
   
   - [ ] My PR adds the following unit tests
   - [ ] TestRecordAllow_List
   - [ ] TestRecordAllow_IList
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (AVRO-3040) GenericWriter: Array requirement too restrictive

2021-02-11 Thread John Gonyo (Jira)
John Gonyo created AVRO-3040:


 Summary: GenericWriter: Array requirement too restrictive
 Key: AVRO-3040
 URL: https://issues.apache.org/jira/browse/AVRO-3040
 Project: Apache Avro
  Issue Type: Improvement
  Components: csharp
Reporter: John Gonyo


GenericWriter expects schema array properties to be an actual Array type, while 
Reflect and Specific writers expect array fields to be of IList. 

For convenience, it would nice to support List, IList or Array types for array 
schema fields.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-3039) ClassCache: Cached class map key is too broad

2021-02-11 Thread John Gonyo (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283119#comment-17283119
 ] 

John Gonyo commented on AVRO-3039:
--

See [https://github.com/apache/avro/pull/1086] for fix

> ClassCache: Cached class map key is too broad
> -
>
> Key: AVRO-3039
> URL: https://issues.apache.org/jira/browse/AVRO-3039
> Project: Apache Avro
>  Issue Type: Bug
>  Components: csharp
>Reporter: John Gonyo
>Priority: Major
>
> Currently ClassMap's cache is keyed by the schema full name.  This restricts 
> clients to providing one Plain Old C# Object (POCO) model per schema.  In 
> some cases a client might wish to provide multiple POCO per schema name, 
> perhaps to support multiple versions of the same schema name in the same 
> runtime.
> A fix is to use the full RecordSchema as the key instead of just the name.
> Additionally, there's a bug in ReflectDefaultReader that can access the 
> incorrect class member property.  Only the reader schema should be used to 
> determine the correct DotNetClass for reading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [avro] Gonyoda opened a new pull request #1086: AVRO-3039 : support multiple versions of the same schema name

2021-02-11 Thread GitBox


Gonyoda opened a new pull request #1086:
URL: https://github.com/apache/avro/pull/1086


   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3039) issues and 
references them in the PR title. For example, "AVRO-1234: My Avro PR"
 - https://issues.apache.org/jira/browse/AVRO-3039
   
   ### Tests
   
   - [ ] My PR adds the following unit tests
   - [ ] TestMultiModelPerSchemaName
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (AVRO-3039) ClassCache: Cached class map key is too broad

2021-02-11 Thread John Gonyo (Jira)
John Gonyo created AVRO-3039:


 Summary: ClassCache: Cached class map key is too broad
 Key: AVRO-3039
 URL: https://issues.apache.org/jira/browse/AVRO-3039
 Project: Apache Avro
  Issue Type: Bug
  Components: csharp
Reporter: John Gonyo


Currently ClassMap's cache is keyed by the schema full name.  This restricts 
clients to providing one Plain Old C# Object (POCO) model per schema.  In some 
cases a client might wish to provide multiple POCO per schema name, perhaps to 
support multiple versions of the same schema name in the same runtime.

A fix is to use the full RecordSchema as the key instead of just the name.

Additionally, there's a bug in ReflectDefaultReader that can access the 
incorrect class member property.  Only the reader schema should be used to 
determine the correct DotNetClass for reading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-3038) ClassCache: Nullable primitive types and arrays not registered correctly to the class map

2021-02-11 Thread John Gonyo (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283090#comment-17283090
 ] 

John Gonyo commented on AVRO-3038:
--

See [https://github.com/apache/avro/pull/1085] for fix

> ClassCache: Nullable primitive types and arrays not registered correctly to 
> the class map
> -
>
> Key: AVRO-3038
> URL: https://issues.apache.org/jira/browse/AVRO-3038
> Project: Apache Avro
>  Issue Type: Bug
>  Components: csharp
>Affects Versions: 1.10.1
>Reporter: John Gonyo
>Priority: Major
>
> When a schema contains a union of a nullable enum, or an array, the 
> corresponding NamedSchema or array type are not registered.
>  
> ClassCache's Union can be updated to check for and correctly register these 
> nullable types.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [avro] Gonyoda opened a new pull request #1085: AVRO-3038: enable support for nullable array/enum

2021-02-11 Thread GitBox


Gonyoda opened a new pull request #1085:
URL: https://github.com/apache/avro/pull/1085


   ### Jira
   
   - [ ] My PR addresses the following [Avro 
Jira](https://issues.apache.org/jira/browse/AVRO/AVRO-3038) issues and 
references them in the PR title.
 - https://issues.apache.org/jira/browse/AVRO-3038
   
   ### Tests
   
   - [ ] My PR adds the following unit tests:
   - [ ] NullableArrayTest
   - [ ] NullableEnumTest
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](https://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (AVRO-3038) ClassCache: Nullable primitive types and arrays not registered correctly to the class map

2021-02-11 Thread John Gonyo (Jira)
John Gonyo created AVRO-3038:


 Summary: ClassCache: Nullable primitive types and arrays not 
registered correctly to the class map
 Key: AVRO-3038
 URL: https://issues.apache.org/jira/browse/AVRO-3038
 Project: Apache Avro
  Issue Type: Bug
  Components: csharp
Affects Versions: 1.10.1
Reporter: John Gonyo


When a schema contains a union of a nullable enum, or an array, the 
corresponding NamedSchema or array type are not registered.

 

ClassCache's Union can be updated to check for and correctly register these 
nullable types.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)