Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jarek Potiuk
That all sounds great! Thanks for all the information Bret.

On Wed, Feb 21, 2024 at 8:57 PM Bret McGuire  wrote:

>To add some additional information to what's already on this thread:
> PYTHON-1378 is actively being looked into.  An initial look has suggested a
> likely cause; it's very likely this was an oversight stemming from the move
> to cibuildwheel.  Assuming I can confirm that a fix will then be provided.
> All of that work will be managed on PYTHON-1378.
>
>Regarding the question about asyncore vs. asyncio: as Jarek correctly
> pointed out we have PYTHON-1375 to represent the work of moving to
> asyncio.  I'll also mention that we've begun defining what will be included
> in the next Python driver release.  Let's call it 3.30.0, although (as
> always) that's subject to change.  This release is currently slated to
> include three major changes:
>
>* Stabilize asyncio reactor and make it the default (PYTHON-1375
> )
>* Officially get off nose and move to pytest (PYTHON-1297
> )
>* Extend vector support to variable length types (PYTHON-1369
> )
>
>As mentioned above everything is subject to change but as of this
> writing the current plan is that PYTHON-1375 will be included in the next
> release.  This can be tracked via the "Fix version" on the various tickets
> above (yup, we already have a 3.30.0 release in JIRA).  You can also follow
> along on the Python driver mailing list
> ;
> I'll likely be starting a more detailed discussion on some of these points
> there soon.
>
>Thanks!
>
>   - Bret -
>
>
> On Wed, Feb 21, 2024 at 7:58 AM Jarek Potiuk  wrote:
>
>> This is cool - thanks Jeff for this explanation, that helps us in making
>> informed decisions. Really appreciate it!
>>
>> Very encouraging for the future :) - I think then, if the donation is
>> on-going, choosing a cassandra-driver (which I understand will become
>> ASF-owned) is definitely a preference for us.
>>
>> And no - we do not have to release it now. We can definitely wait - we
>> can just exclude Python 3.12 support until the .whl has libev support (I
>> hope my issue will be handled soon by Datastax :). Then we can re-enable
>> Python 3.12 support and add instructions to our users to make sure libev is
>> included on Python 3.12. So it does not block us now, and we have
>> clear vision on the way forward.
>>
>> BTW. I looked at the links - they were mostly about Java Driver and
>> mention Python Driver as the next logical step (agree) - is there anything
>> happening currently with it ? There is a doc link that I have no access to,
>> but would be great to know when it might happen? I am just eager to see it
>> happen.
>>
>> J.
>>
>> On Wed, Feb 21, 2024 at 12:53 PM Jeff Jirsa  wrote:
>>
>>>
>>>
>>> On 2024/02/21 09:26:53 Jarek Potiuk wrote:
>>> > Hello dear Cassandra community,
>>> >
>>> > I am a fellow PMC member of Apache Airflow and recently we started to
>>> look
>>> > at the Cassandra provider of ours in the context of Python 3.12
>>> migration
>>> > and the integration raised my interest.
>>> >
>>> > TL;DR; I am quite confused, which client should we use to be
>>> future-proof
>>> > and I would appreciate the advice of the community on it, also I would
>>> like
>>> > to understand why there is no community-managed client, as seems that
>>> with
>>> > the current approach, any Python project (including ASF ones are pretty
>>> > much forced to use 3rd-party managed way to use Cassandra, which I find
>>> > rather strange.
>>> >
>>> > Context:
>>> >
>>> > So far in Apache Airflow we were using
>>> > https://github.com/datastax/python-driver/ to connect to Cassandra,
>>> but
>>> > when we worked on Python 3.12 compatibility.  While looking at it, I
>>> > discovered something strange
>>> >
>>>
>>> Mid-donated to the foundation:
>>>
>>> CEP:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>>>
>>> [Private@]:
>>> https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0
>>>
>>> Status in board report:
>>> https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt
>>>
>>> The Scylla version is a fork WITH ADDITIONS that work with
>>> implementation details of Scylladb not present in Apache Cassandra.
>>>
>>> Preference use "Datastax" driver under donation if at all possible, and
>>> get it fixed as rapidly as is practical, but given that Scylla has already
>>> fixed the issue in theirs and it's an apache licensed fork of the same
>>> code, if you have to ship something to remain functional, that seems like a
>>> reasonable fallback.
>>>
>>>
>>>
>>>
>>>
>>>


Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Bret McGuire
   To add some additional information to what's already on this thread:
PYTHON-1378 is actively being looked into.  An initial look has suggested a
likely cause; it's very likely this was an oversight stemming from the move
to cibuildwheel.  Assuming I can confirm that a fix will then be provided.
All of that work will be managed on PYTHON-1378.

   Regarding the question about asyncore vs. asyncio: as Jarek correctly
pointed out we have PYTHON-1375 to represent the work of moving to
asyncio.  I'll also mention that we've begun defining what will be included
in the next Python driver release.  Let's call it 3.30.0, although (as
always) that's subject to change.  This release is currently slated to
include three major changes:

   * Stabilize asyncio reactor and make it the default (PYTHON-1375
)
   * Officially get off nose and move to pytest (PYTHON-1297
)
   * Extend vector support to variable length types (PYTHON-1369
)

   As mentioned above everything is subject to change but as of this
writing the current plan is that PYTHON-1375 will be included in the next
release.  This can be tracked via the "Fix version" on the various tickets
above (yup, we already have a 3.30.0 release in JIRA).  You can also follow
along on the Python driver mailing list
; I'll
likely be starting a more detailed discussion on some of these points there
soon.

   Thanks!

  - Bret -


On Wed, Feb 21, 2024 at 7:58 AM Jarek Potiuk  wrote:

> This is cool - thanks Jeff for this explanation, that helps us in making
> informed decisions. Really appreciate it!
>
> Very encouraging for the future :) - I think then, if the donation is
> on-going, choosing a cassandra-driver (which I understand will become
> ASF-owned) is definitely a preference for us.
>
> And no - we do not have to release it now. We can definitely wait - we can
> just exclude Python 3.12 support until the .whl has libev support (I hope
> my issue will be handled soon by Datastax :). Then we can re-enable Python
> 3.12 support and add instructions to our users to make sure libev is
> included on Python 3.12. So it does not block us now, and we have
> clear vision on the way forward.
>
> BTW. I looked at the links - they were mostly about Java Driver and
> mention Python Driver as the next logical step (agree) - is there anything
> happening currently with it ? There is a doc link that I have no access to,
> but would be great to know when it might happen? I am just eager to see it
> happen.
>
> J.
>
> On Wed, Feb 21, 2024 at 12:53 PM Jeff Jirsa  wrote:
>
>>
>>
>> On 2024/02/21 09:26:53 Jarek Potiuk wrote:
>> > Hello dear Cassandra community,
>> >
>> > I am a fellow PMC member of Apache Airflow and recently we started to
>> look
>> > at the Cassandra provider of ours in the context of Python 3.12
>> migration
>> > and the integration raised my interest.
>> >
>> > TL;DR; I am quite confused, which client should we use to be
>> future-proof
>> > and I would appreciate the advice of the community on it, also I would
>> like
>> > to understand why there is no community-managed client, as seems that
>> with
>> > the current approach, any Python project (including ASF ones are pretty
>> > much forced to use 3rd-party managed way to use Cassandra, which I find
>> > rather strange.
>> >
>> > Context:
>> >
>> > So far in Apache Airflow we were using
>> > https://github.com/datastax/python-driver/ to connect to Cassandra, but
>> > when we worked on Python 3.12 compatibility.  While looking at it, I
>> > discovered something strange
>> >
>>
>> Mid-donated to the foundation:
>>
>> CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>>
>> [Private@]:
>> https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0
>>
>> Status in board report:
>> https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt
>>
>> The Scylla version is a fork WITH ADDITIONS that work with implementation
>> details of Scylladb not present in Apache Cassandra.
>>
>> Preference use "Datastax" driver under donation if at all possible, and
>> get it fixed as rapidly as is practical, but given that Scylla has already
>> fixed the issue in theirs and it's an apache licensed fork of the same
>> code, if you have to ship something to remain functional, that seems like a
>> reasonable fallback.
>>
>>
>>
>>
>>
>>


Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jarek Potiuk
This is cool - thanks Jeff for this explanation, that helps us in making
informed decisions. Really appreciate it!

Very encouraging for the future :) - I think then, if the donation is
on-going, choosing a cassandra-driver (which I understand will become
ASF-owned) is definitely a preference for us.

And no - we do not have to release it now. We can definitely wait - we can
just exclude Python 3.12 support until the .whl has libev support (I hope
my issue will be handled soon by Datastax :). Then we can re-enable Python
3.12 support and add instructions to our users to make sure libev is
included on Python 3.12. So it does not block us now, and we have
clear vision on the way forward.

BTW. I looked at the links - they were mostly about Java Driver and mention
Python Driver as the next logical step (agree) - is there anything
happening currently with it ? There is a doc link that I have no access to,
but would be great to know when it might happen? I am just eager to see it
happen.

J.

On Wed, Feb 21, 2024 at 12:53 PM Jeff Jirsa  wrote:

>
>
> On 2024/02/21 09:26:53 Jarek Potiuk wrote:
> > Hello dear Cassandra community,
> >
> > I am a fellow PMC member of Apache Airflow and recently we started to
> look
> > at the Cassandra provider of ours in the context of Python 3.12 migration
> > and the integration raised my interest.
> >
> > TL;DR; I am quite confused, which client should we use to be future-proof
> > and I would appreciate the advice of the community on it, also I would
> like
> > to understand why there is no community-managed client, as seems that
> with
> > the current approach, any Python project (including ASF ones are pretty
> > much forced to use 3rd-party managed way to use Cassandra, which I find
> > rather strange.
> >
> > Context:
> >
> > So far in Apache Airflow we were using
> > https://github.com/datastax/python-driver/ to connect to Cassandra, but
> > when we worked on Python 3.12 compatibility.  While looking at it, I
> > discovered something strange
> >
>
> Mid-donated to the foundation:
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
>
> [Private@]:
> https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0
>
> Status in board report:
> https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt
>
> The Scylla version is a fork WITH ADDITIONS that work with implementation
> details of Scylladb not present in Apache Cassandra.
>
> Preference use "Datastax" driver under donation if at all possible, and
> get it fixed as rapidly as is practical, but given that Scylla has already
> fixed the issue in theirs and it's an apache licensed fork of the same
> code, if you have to ship something to remain functional, that seems like a
> reasonable fallback.
>
>
>
>
>
>


Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jeff Jirsa



On 2024/02/21 09:26:53 Jarek Potiuk wrote:
> Hello dear Cassandra community,
> 
> I am a fellow PMC member of Apache Airflow and recently we started to look
> at the Cassandra provider of ours in the context of Python 3.12 migration
> and the integration raised my interest.
> 
> TL;DR; I am quite confused, which client should we use to be future-proof
> and I would appreciate the advice of the community on it, also I would like
> to understand why there is no community-managed client, as seems that with
> the current approach, any Python project (including ASF ones are pretty
> much forced to use 3rd-party managed way to use Cassandra, which I find
> rather strange.
> 
> Context:
> 
> So far in Apache Airflow we were using
> https://github.com/datastax/python-driver/ to connect to Cassandra, but
> when we worked on Python 3.12 compatibility.  While looking at it, I
> discovered something strange
> 

Mid-donated to the foundation: 

CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

[Private@]: https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0

Status in board report: 
https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt

The Scylla version is a fork WITH ADDITIONS that work with implementation 
details of Scylladb not present in Apache Cassandra.

Preference use "Datastax" driver under donation if at all possible, and get it 
fixed as rapidly as is practical, but given that Scylla has already fixed the 
issue in theirs and it's an apache licensed fork of the same code, if you have 
to ship something to remain functional, that seems like a reasonable fallback.







Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jarek Potiuk
Ah. And also to add - I created this issue in datastack asking to add libev
support to the compiled .whl package they release:

[6] cassandra-driver for Python 3.12 Linux is compiled without libev
support :
https://datastax-oss.atlassian.net/jira/software/c/projects/PYTHON/issues/PYTHON-1378

On Wed, Feb 21, 2024 at 10:26 AM Jarek Potiuk  wrote:

> Hello dear Cassandra community,
>
> I am a fellow PMC member of Apache Airflow and recently we started to look
> at the Cassandra provider of ours in the context of Python 3.12 migration
> and the integration raised my interest.
>
> TL;DR; I am quite confused, which client should we use to be future-proof
> and I would appreciate the advice of the community on it, also I would like
> to understand why there is no community-managed client, as seems that with
> the current approach, any Python project (including ASF ones are pretty
> much forced to use 3rd-party managed way to use Cassandra, which I find
> rather strange.
>
> Context:
>
> So far in Apache Airflow we were using
> https://github.com/datastax/python-driver/ to connect to Cassandra, but
> when we worked on Python 3.12 compatibility.  While looking at it, I
> discovered something strange
>
> This driver is published on Pypi  as "Cassandra driver" [1] which raises a
> bit of a question about trademark - I was so far convinced this driver is
> managed by the Cassandra community, but at a closer inspection it turned
> out that it is - in fact - Datastax driver. I find it pretty confusing to
> be honest, and with all the debate about ASF trademarks, this should IMHO
> raise a few eyebrows and PMC reaction - if you ask me. As a PMC of Apache
> Airflow I am responsible to raise trademark issues if I see them and that
> one seems to be at odds with the ASF rules. And if I am confused by
> the PyPI naming, then I am pretty sure zany of the users are as well.
>
> Note that I am not attacking anyone with that, I just noticed that this
> should likely be handled by the PMC somehow (or that would be my advise at
> least as a fellow ASF member and PMC member of a friendly ASF project)
>
> But that's a bit tangential to the problem. Coming back to the main
> problem.
>
> I did quite some research and it turned out that the driver still uses the
> default asyncore stdlib (which is removed in Python 3.12) and even if
> theoretically we could use libev reactor, it does not work out of the box
> with the .whl released even if proper libraries are installed - you really
> have to take an sdist and build the package with gcc configured and
> libev4/libev-devel installed.
>
> Another option is to use the asyncio reactor [2] as far as I understand -
> but as I understand from the issue [3] - this support is still experimental
> and it''s not ready for prime time.
>
> This is all captured in the PR [4] where I work on Python 3.12
> compatibility and Cassandra is - literally - the last remaining provider
> that we have to make a decision on what to do.
>
> That makes it rather useless fpr us - because we would not only complicate
> our testing / tooling setup (we have ~90 providers and pretty complicated
> system to manage dependencies already) and also it would make our users who
> would want to use Python 3.12 require to the same, which is quite a
> blocker. And handling user issues in this case would become rather tiring.
>
> In the same PR Israel Fruchter  - who helped us with the Cassandra issue
> and suggested that another option is to use the Scylladb driver - that is
> 100% compatible and published and released by Scylla [5]. I tested it and
> the .whl packages nicely work with libev installed - as expected (and
> initially Israel thought the datastax driver will work similarly). From
> Israel's explanation Datastax and Scylla are cooperating on the driver (in
> fact Scylla one is a fork of the Datastax one) but there is no insight who
> and how builds the packages (which also raised my eyebrow because it seems
> that - unlike in ASF, the process of building and releasing the package is
> not transparent and verifiable).
>
> Now - we have two choices:
>
> 1) We can use "cassandra-driver" (which really is a "datastax driver") and
> disable Cassandra provider for the users of Airflow for Python 3.12 until
> Datastax fixes the compatibility with Python 3.12
>
> 2) W can switch to Scylla driver and release next provider with Python
> 3.12 support
>
> So ... Providing all the context I have two questions:
>
> Q1: What would be the recommended solution by the community here. I
> understand the community has no impact on Datastax decisions and effort on
> releasing those drivers, so you can at most ask Datastax to fix the
> compatibility issue. As a user I have no insight on what relations are
> between the Cassandra community, Datastax and Scylla, so I am reaching here
> as the place to advise me on which option is best.  (This I am asking as a
> confused user)
>
> Q2: I find it pretty worrying that such an important 

Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jarek Potiuk
Hello dear Cassandra community,

I am a fellow PMC member of Apache Airflow and recently we started to look
at the Cassandra provider of ours in the context of Python 3.12 migration
and the integration raised my interest.

TL;DR; I am quite confused, which client should we use to be future-proof
and I would appreciate the advice of the community on it, also I would like
to understand why there is no community-managed client, as seems that with
the current approach, any Python project (including ASF ones are pretty
much forced to use 3rd-party managed way to use Cassandra, which I find
rather strange.

Context:

So far in Apache Airflow we were using
https://github.com/datastax/python-driver/ to connect to Cassandra, but
when we worked on Python 3.12 compatibility.  While looking at it, I
discovered something strange

This driver is published on Pypi  as "Cassandra driver" [1] which raises a
bit of a question about trademark - I was so far convinced this driver is
managed by the Cassandra community, but at a closer inspection it turned
out that it is - in fact - Datastax driver. I find it pretty confusing to
be honest, and with all the debate about ASF trademarks, this should IMHO
raise a few eyebrows and PMC reaction - if you ask me. As a PMC of Apache
Airflow I am responsible to raise trademark issues if I see them and that
one seems to be at odds with the ASF rules. And if I am confused by
the PyPI naming, then I am pretty sure zany of the users are as well.

Note that I am not attacking anyone with that, I just noticed that this
should likely be handled by the PMC somehow (or that would be my advise at
least as a fellow ASF member and PMC member of a friendly ASF project)

But that's a bit tangential to the problem. Coming back to the main problem.

I did quite some research and it turned out that the driver still uses the
default asyncore stdlib (which is removed in Python 3.12) and even if
theoretically we could use libev reactor, it does not work out of the box
with the .whl released even if proper libraries are installed - you really
have to take an sdist and build the package with gcc configured and
libev4/libev-devel installed.

Another option is to use the asyncio reactor [2] as far as I understand -
but as I understand from the issue [3] - this support is still experimental
and it''s not ready for prime time.

This is all captured in the PR [4] where I work on Python 3.12
compatibility and Cassandra is - literally - the last remaining provider
that we have to make a decision on what to do.

That makes it rather useless fpr us - because we would not only complicate
our testing / tooling setup (we have ~90 providers and pretty complicated
system to manage dependencies already) and also it would make our users who
would want to use Python 3.12 require to the same, which is quite a
blocker. And handling user issues in this case would become rather tiring.

In the same PR Israel Fruchter  - who helped us with the Cassandra issue
and suggested that another option is to use the Scylladb driver - that is
100% compatible and published and released by Scylla [5]. I tested it and
the .whl packages nicely work with libev installed - as expected (and
initially Israel thought the datastax driver will work similarly). From
Israel's explanation Datastax and Scylla are cooperating on the driver (in
fact Scylla one is a fork of the Datastax one) but there is no insight who
and how builds the packages (which also raised my eyebrow because it seems
that - unlike in ASF, the process of building and releasing the package is
not transparent and verifiable).

Now - we have two choices:

1) We can use "cassandra-driver" (which really is a "datastax driver") and
disable Cassandra provider for the users of Airflow for Python 3.12 until
Datastax fixes the compatibility with Python 3.12

2) W can switch to Scylla driver and release next provider with Python 3.12
support

So ... Providing all the context I have two questions:

Q1: What would be the recommended solution by the community here. I
understand the community has no impact on Datastax decisions and effort on
releasing those drivers, so you can at most ask Datastax to fix the
compatibility issue. As a user I have no insight on what relations are
between the Cassandra community, Datastax and Scylla, so I am reaching here
as the place to advise me on which option is best.  (This I am asking as a
confused user)

Q2: I find it pretty worrying that such an important interface (data world
is driven by Python) is not under the community "umbrella" - seems that a
very important thing for the users of Cassandra is managed and
controlled by a 3rd-parties, and the users (as it is in this case) are
pretty much left on the "mercy" (for the lack of better word) of the
3rd-parties - those are the parties that decide on whether Python 3.12
users are able to use Cassandra. If I had such a situation in Airflow, I
would be deeply worried in the PMC. Also what adds to that is the