Re: Unified Python Support

2020-01-06 Thread Driesprong, Fokko
Makes sense Michael.

I'm still working on this one:
https://issues.apache.org/jira/browse/AVRO-2663

The current fix that I came up with does not fix the root cause, but this
is very complex. I'd like to fix this, and then start the release for
1.9.2. Hopefully, I have some time this weekend. Would that work for you?

One more thing, for Airflow we had to release the package name from airflow
 to apache-airflow
. We've dropped the old one by
throwing an Error when you try to import the package. Would this be
something that we would like to do for Avro in the future? For example,
releasing avro-python3 version 1.10.0 with the sole message of having to
import the avro package? Would like to get your opinion on this.

Cheers, Fokko

Op ma 6 jan. 2020 om 13:32 schreef Michael A. Smith :

> I'd suggest that we do at least one release that has support for both
> python 2 and 3 in the same codebase. This may open doors for folks trying
> to transition from both avro-python3 (lang/py3) to avro (lang/py) as well
> as those trying to go from python 2 to 3 with lang/py.
>
> After that we should officially close out support for python 2.
>
> Please let me know how I can help with the release process. Should we have
> a release soon?
>
> On Sun, Jan 5, 2020 at 13:31 Driesprong, Fokko 
> wrote:
>
> > Thanks for bringing this up Michael and an awesome job on the Python
> part.
> >
> > I'd suggest stopping releasing the avro-python3, and continue only
> > releasing the avro package itself: https://pypi.org/project/avro/
> >
> > This will stop the releases of avro-python3, and in time we can also
> remove
> > it from the git repository. The big question is, are we still going to
> > support Python2 for a while, it is still part of the CI. Supporting only
> > higher versions of Python, such as 3.6, allows us to use new features,
> such
> > as type annotations.
> >
> > Cheers, Fokko
> >
> > Op zo 5 jan. 2020 om 18:44 schreef Michael A. Smith <
> mich...@smith-li.com
> > >:
> >
> > > Hi! Given that Python has ended support for python 2 as of the first, I
> > > went ahead and merged the PR. Test coverage is pretty good, so I'm
> fairly
> > > confident; however this is a big change, involving nearly every module
> in
> > > the python part of the project.
> > >
> > > So I'm wondering how this works when it comes to releasing. There
> aren't
> > > any API changes in the literal implementation. So in that light there
> > isn't
> > > any need to treat this version specially. But Python itself is markedly
> > > different between 2 and 3 in some relevant areas.
> > >
> > > Do we need to do anything different for the next release of the lang/py
> > > codebase?
> > >
> > > Thanks for your guidance!
> > >
> > > On Fri, Dec 20, 2019 at 11:43 Ryan Skraba  wrote:
> > >
> > > > Hello!  I wanted to make sure to thank you for doing all this
> > > > python2/3 work!  I've learned a lot by watching and reading the
> Python
> > > > PRs coming through.
> > > >
> > > > I did a rough pass through the types of changes and cleanup, and I'm
> > > > pretty happy :D  I'll try to get more thorough pass done, but
> (indeed)
> > > > I probably won't have a lot of time between now and the new year.
> > > >
> > > > All my best, Ryan
> > > >
> > > > On Fri, Dec 20, 2019 at 12:36 AM Michael A. Smith <
> > mich...@smith-li.com>
> > > > wrote:
> > > > >
> > > > > Hi, I've finished building out a unified python approach in
> lang/py.
> > > > > It passes our full Yetus tests in cpython 2.7 and 3.5. I also
> tested
> > > > > it and passed locally in 3.6, 3.7 and 3.8 as well as pypy 7.2.0 for
> > > > > both 2.7 and 3.6.
> > > > >
> > > > > The pull request is here: https://github.com/apache/avro/pull/744
> > > > >
> > > > > I know many people are on holiday or unavailable in the near
> future,
> > > > > but I would really appreciate some eyes on this if you can find the
> > > > > time. The tests give me some confidence, but the change was a
> > > > > significant lift, as Python3 and Python2 handle bytes and unicode
> > > > > strings in substantially different ways.
> > > > >
> > > > > This gives us a path forward to unifying our python support (I
> mean,
> > > > > dropping lang/py3 and focusing on one API in one place) as well as
> > > > > managing the sunset of python 2 support altogether.
> > > > >
> > > > > Thank you for your help with this project, either way!
> > > > >
> > > > > - Michael
> > > >
> > >
> >
>


[jira] [Updated] (AVRO-2675) Add lang/py/.tox to the RAT exclusion list

2020-01-06 Thread Kengo Seki (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kengo Seki updated AVRO-2675:
-
Component/s: python

> Add lang/py/.tox to the RAT exclusion list
> --
>
> Key: AVRO-2675
> URL: https://issues.apache.org/jira/browse/AVRO-2675
> Project: Apache Avro
>  Issue Type: Bug
>  Components: python
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> After merging AVRO-2656, {{./build.sh clean test}} on the toplevel directory 
> fails with a RAT verification error. It seems that {{lang/py/.tox}} should be 
> added the exclusion list for RAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-2676) Document that lang/py3 will be deprecated

2020-01-06 Thread Kengo Seki (Jira)
Kengo Seki created AVRO-2676:


 Summary: Document that lang/py3 will be deprecated
 Key: AVRO-2676
 URL: https://issues.apache.org/jira/browse/AVRO-2676
 Project: Apache Avro
  Issue Type: Improvement
  Components: doc, python
Reporter: Kengo Seki
Assignee: Kengo Seki


AVRO-2441 added a quick-start guide for avro-python3, but it will be deprecated 
in the near future by merging AVRO-2656. So we should indicate that users 
should not use it anymore, or simply remove that document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-2675) Add lang/py/.tox to the RAT exclusion list

2020-01-06 Thread Kengo Seki (Jira)
Kengo Seki created AVRO-2675:


 Summary: Add lang/py/.tox to the RAT exclusion list
 Key: AVRO-2675
 URL: https://issues.apache.org/jira/browse/AVRO-2675
 Project: Apache Avro
  Issue Type: Bug
Reporter: Kengo Seki
Assignee: Kengo Seki


After merging AVRO-2656, {{./build.sh clean test}} on the toplevel directory 
fails with a RAT verification error. It seems that {{lang/py/.tox}} should be 
added the exclusion list for RAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Unified Python Support

2020-01-06 Thread Sean Busbey
Yeah we should have a release; we are overdue.

On Mon, Jan 6, 2020, 06:32 Michael A. Smith  wrote:

> I'd suggest that we do at least one release that has support for both
> python 2 and 3 in the same codebase. This may open doors for folks trying
> to transition from both avro-python3 (lang/py3) to avro (lang/py) as well
> as those trying to go from python 2 to 3 with lang/py.
>
> After that we should officially close out support for python 2.
>
> Please let me know how I can help with the release process. Should we have
> a release soon?
>
> On Sun, Jan 5, 2020 at 13:31 Driesprong, Fokko 
> wrote:
>
> > Thanks for bringing this up Michael and an awesome job on the Python
> part.
> >
> > I'd suggest stopping releasing the avro-python3, and continue only
> > releasing the avro package itself: https://pypi.org/project/avro/
> >
> > This will stop the releases of avro-python3, and in time we can also
> remove
> > it from the git repository. The big question is, are we still going to
> > support Python2 for a while, it is still part of the CI. Supporting only
> > higher versions of Python, such as 3.6, allows us to use new features,
> such
> > as type annotations.
> >
> > Cheers, Fokko
> >
> > Op zo 5 jan. 2020 om 18:44 schreef Michael A. Smith <
> mich...@smith-li.com
> > >:
> >
> > > Hi! Given that Python has ended support for python 2 as of the first, I
> > > went ahead and merged the PR. Test coverage is pretty good, so I'm
> fairly
> > > confident; however this is a big change, involving nearly every module
> in
> > > the python part of the project.
> > >
> > > So I'm wondering how this works when it comes to releasing. There
> aren't
> > > any API changes in the literal implementation. So in that light there
> > isn't
> > > any need to treat this version specially. But Python itself is markedly
> > > different between 2 and 3 in some relevant areas.
> > >
> > > Do we need to do anything different for the next release of the lang/py
> > > codebase?
> > >
> > > Thanks for your guidance!
> > >
> > > On Fri, Dec 20, 2019 at 11:43 Ryan Skraba  wrote:
> > >
> > > > Hello!  I wanted to make sure to thank you for doing all this
> > > > python2/3 work!  I've learned a lot by watching and reading the
> Python
> > > > PRs coming through.
> > > >
> > > > I did a rough pass through the types of changes and cleanup, and I'm
> > > > pretty happy :D  I'll try to get more thorough pass done, but
> (indeed)
> > > > I probably won't have a lot of time between now and the new year.
> > > >
> > > > All my best, Ryan
> > > >
> > > > On Fri, Dec 20, 2019 at 12:36 AM Michael A. Smith <
> > mich...@smith-li.com>
> > > > wrote:
> > > > >
> > > > > Hi, I've finished building out a unified python approach in
> lang/py.
> > > > > It passes our full Yetus tests in cpython 2.7 and 3.5. I also
> tested
> > > > > it and passed locally in 3.6, 3.7 and 3.8 as well as pypy 7.2.0 for
> > > > > both 2.7 and 3.6.
> > > > >
> > > > > The pull request is here: https://github.com/apache/avro/pull/744
> > > > >
> > > > > I know many people are on holiday or unavailable in the near
> future,
> > > > > but I would really appreciate some eyes on this if you can find the
> > > > > time. The tests give me some confidence, but the change was a
> > > > > significant lift, as Python3 and Python2 handle bytes and unicode
> > > > > strings in substantially different ways.
> > > > >
> > > > > This gives us a path forward to unifying our python support (I
> mean,
> > > > > dropping lang/py3 and focusing on one API in one place) as well as
> > > > > managing the sunset of python 2 support altogether.
> > > > >
> > > > > Thank you for your help with this project, either way!
> > > > >
> > > > > - Michael
> > > >
> > >
> >
>


[jira] [Comment Edited] (AVRO-2070) Tolerate any Number when writing primitive values in Java in GenericDatumWriter

2020-01-06 Thread jason mathews (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008942#comment-17008942
 ] 

jason mathews edited comment on AVRO-2070 at 1/6/20 4:12 PM:
-

This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter 
class to allow for mixed number type instances as this fix would elminate doing 
so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is 
Double results in a ClassCastException.

Java Example: 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);

// serialize
GenericDatumWriter datumWriter = new 
GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter dataWriter = new 
DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
{code}
Output: 
{noformat}
Exception in thread "main" 
org.apache.avro.file.DataFileWriter$AppendWriteException:
 java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Double
{noformat}
 But having mixed numeric types in Python fastavro implementation has no such 
number restriction and a double schema type for example can contain a mix of 
floating point or integers.

Python Example: 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 


was (Author: docjason):
This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter 
class to allow for mixed number type instances as this fix would elminate doing 
so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is 
Double results in a ClassCastException.

Java Example:

 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);
// serialize
GenericDatumWriter datumWriter = new 
GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter dataWriter = new 
DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
 
{code}
Output:

 
{noformat}
Exception in thread "main" 
org.apache.avro.file.DataFileWriter$AppendWriteException:  
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Double
{noformat}
 

But having mixed numeric types in Python fastavro implementation has no such 
number restriction and a double schema type for example can contain a mix of 
floating point or integers.

Python Example:

 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 

> Tolerate any Number when writing primitive values in Java in 
> GenericDatumWriter
> ---
>
> Key: AVRO-2070
> URL: https://issues.apache.org/jira/browse/AVRO-2070
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: 

[jira] [Commented] (AVRO-2070) Tolerate any Number when writing primitive values in Java in GenericDatumWriter

2020-01-06 Thread jason mathews (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17008942#comment-17008942
 ] 

jason mathews commented on AVRO-2070:
-

This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter 
class to allow for mixed number type instances as this fix would elminate doing 
so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is 
Double results in a ClassCastException.

Java Example:

 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);
// serialize
GenericDatumWriter datumWriter = new 
GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter dataWriter = new 
DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
 
{code}
Output:

 
{noformat}
Exception in thread "main" 
org.apache.avro.file.DataFileWriter$AppendWriteException:  
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.Double
{noformat}
 

But having mixed numeric types in Python fastavro implementation has no such 
number restriction and a double schema type for example can contain a mix of 
floating point or integers.

Python Example:

 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 

> Tolerate any Number when writing primitive values in Java in 
> GenericDatumWriter
> ---
>
> Key: AVRO-2070
> URL: https://issues.apache.org/jira/browse/AVRO-2070
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: Daniil Gitelson
>Priority: Major
>
> Tolerating any Number (instead of concrete Long, Double, Float) makes 
> possible to use mutable Number implmentation for performance reasons 
> (specially for primitive collection iterations)
> Currently, this only works for int only:
> {code:java}
>   // Here it works
>   case INT: out.writeInt(((Number)datum).intValue()); break;
>   // This should be replaced with ((Number)datum).longValue() etc
>   case LONG:out.writeLong((Long)datum);   break;
>   case FLOAT:   out.writeFloat((Float)datum); break;
>   case DOUBLE:  out.writeDouble((Double)datum);   break;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AVRO-2663) Projection on nested records does not work

2020-01-06 Thread Fokko Driesprong (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AVRO-2663 started by Fokko Driesprong.
--
> Projection on nested records does not work
> --
>
> Key: AVRO-2663
> URL: https://issues.apache.org/jira/browse/AVRO-2663
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Critical
> Fix For: 1.9.2
>
>
> I've found out that when reading nested records, with a different read and 
> write schema gives errors. The field that isn't in the read schema, is still 
> read from the file, and therefore new fields potentially contain invalid data 
> because it shares the position in the values array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AVRO-2663) Projection on nested records does not work

2020-01-06 Thread Fokko Driesprong (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fokko Driesprong updated AVRO-2663:
---
Priority: Critical  (was: Major)

> Projection on nested records does not work
> --
>
> Key: AVRO-2663
> URL: https://issues.apache.org/jira/browse/AVRO-2663
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Critical
> Fix For: 1.9.2
>
>
> I've found out that when reading nested records, with a different read and 
> write schema gives errors. The field that isn't in the read schema, is still 
> read from the file, and therefore new fields potentially contain invalid data 
> because it shares the position in the values array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Unified Python Support

2020-01-06 Thread Michael A. Smith
I'd suggest that we do at least one release that has support for both
python 2 and 3 in the same codebase. This may open doors for folks trying
to transition from both avro-python3 (lang/py3) to avro (lang/py) as well
as those trying to go from python 2 to 3 with lang/py.

After that we should officially close out support for python 2.

Please let me know how I can help with the release process. Should we have
a release soon?

On Sun, Jan 5, 2020 at 13:31 Driesprong, Fokko  wrote:

> Thanks for bringing this up Michael and an awesome job on the Python part.
>
> I'd suggest stopping releasing the avro-python3, and continue only
> releasing the avro package itself: https://pypi.org/project/avro/
>
> This will stop the releases of avro-python3, and in time we can also remove
> it from the git repository. The big question is, are we still going to
> support Python2 for a while, it is still part of the CI. Supporting only
> higher versions of Python, such as 3.6, allows us to use new features, such
> as type annotations.
>
> Cheers, Fokko
>
> Op zo 5 jan. 2020 om 18:44 schreef Michael A. Smith  >:
>
> > Hi! Given that Python has ended support for python 2 as of the first, I
> > went ahead and merged the PR. Test coverage is pretty good, so I'm fairly
> > confident; however this is a big change, involving nearly every module in
> > the python part of the project.
> >
> > So I'm wondering how this works when it comes to releasing. There aren't
> > any API changes in the literal implementation. So in that light there
> isn't
> > any need to treat this version specially. But Python itself is markedly
> > different between 2 and 3 in some relevant areas.
> >
> > Do we need to do anything different for the next release of the lang/py
> > codebase?
> >
> > Thanks for your guidance!
> >
> > On Fri, Dec 20, 2019 at 11:43 Ryan Skraba  wrote:
> >
> > > Hello!  I wanted to make sure to thank you for doing all this
> > > python2/3 work!  I've learned a lot by watching and reading the Python
> > > PRs coming through.
> > >
> > > I did a rough pass through the types of changes and cleanup, and I'm
> > > pretty happy :D  I'll try to get more thorough pass done, but (indeed)
> > > I probably won't have a lot of time between now and the new year.
> > >
> > > All my best, Ryan
> > >
> > > On Fri, Dec 20, 2019 at 12:36 AM Michael A. Smith <
> mich...@smith-li.com>
> > > wrote:
> > > >
> > > > Hi, I've finished building out a unified python approach in lang/py.
> > > > It passes our full Yetus tests in cpython 2.7 and 3.5. I also tested
> > > > it and passed locally in 3.6, 3.7 and 3.8 as well as pypy 7.2.0 for
> > > > both 2.7 and 3.6.
> > > >
> > > > The pull request is here: https://github.com/apache/avro/pull/744
> > > >
> > > > I know many people are on holiday or unavailable in the near future,
> > > > but I would really appreciate some eyes on this if you can find the
> > > > time. The tests give me some confidence, but the change was a
> > > > significant lift, as Python3 and Python2 handle bytes and unicode
> > > > strings in substantially different ways.
> > > >
> > > > This gives us a path forward to unifying our python support (I mean,
> > > > dropping lang/py3 and focusing on one API in one place) as well as
> > > > managing the sunset of python 2 support altogether.
> > > >
> > > > Thank you for your help with this project, either way!
> > > >
> > > > - Michael
> > >
> >
>