Re: [DISCUSS] SQL support in Cassandra

2025-11-12 Thread Joel Shepherd

Just responding to one point among many good ones, below ...

On 11/5/2025 10:29 AM, David Capwell wrote:
SQL has been building its surface area for decades and trying to catch 
up is a significant effort and how to make things correct and 
performant becomes an issue.  In the latest spec there is now support 
for graph queries, so signing up to be compatible means we need to 
implement the below


SELECT *
FROM GRAPH_TABLE(my_graph
    MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name = 
'Alice')

    WHERE a.name = 'Mary'
    COLUMNS (a.name AS person_a, b.name AS person_b)
);


It depends on your definition of compatible. One definition, or litmus 
test, could be that any application written against Postgresql can be 
pointed at Cassandra and (modulo swapping drivers, endpoints, config) 
"just work". I.e., Cassandra wouldn't be considered compatible until 
your example query and others just work on Cassandra.


Another definition is that any application written against CQL can be 
rewritten against Cassandra SQL (no extra surface area), and then can be 
pointed at a Postgres instance and "just work" ... though performance 
and scaling characteristics might be different.


The second definition is reasonable and an easier bar to clear.

Thanks -- Joel.


That above example is is just a simple example, it gets far more 
complex and would be harder for C* to support.



I would be curious to see a gap analysis between CQL and SQL 
that include the differences in behaviors. I suspect that it will 
bring a few surprises and provide some more solid foundation to this 
discussion.


I think this is a good starting point.  There are some nice things in 
SQL missing in C* that could be implemented without a ton of risk, and 
opening up the discussion around these areas makes sense to me.


Off the top of my head, here are basic queries that work in SQL but 
not CQL, and there is very low levels of risk to support.


SELECT 1 — simple query to test if the connection is still live

SELECT func(42) FROM system.peers; — this has lead someone I know to 
have to implement functions that return constants specifically to work 
around this limitation…





On Nov 5, 2025, at 9:15 AM, Jeff Jirsa  wrote:

CQL just to demonstrate it’s possible

Fat node style would indeed be faster but im mostly proving that its 
functional



On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:


I very much like Jeff, Josh et al.'s proposals around the pluggable 
stateless API layer. Also I agree with Chris I would prefer a 
simpler API not a more complex one for our applications to couple to 
e.g. the Java stdlib. This also sets up a really nice path 
where the community members can build the layers that make sense 
first out-of-tree, and as a project we can choose the successful 
ones to bring in-tree. Whichever API those layers couple to would be 
a new semi-public interface though which has to be weighed.


Jeff I am curious, in that prototype you are hacking are you 
interacting directly with the internode protocol and verb system or 
going through CQL? I imagine there could be some strengths to going 
straight to the internode?


-Joey

On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie  
wrote:



Again from

Right. I'm just zooming out a bit more and applying that same
logical pattern broadly to other API language domains, not just
SQL. But yes - your point definitely stands.

On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:

I’m grooving on what “Cloud Native Jeff” is saying here and I
would like to see where this could go. If we use a well
established library like Calcite, then there is no API to
maintain. We might find parts of Cassandra along the way we
could alter to make it easier to integrate, but so far that’s
just a premature optimization.

Sper interested to see the TPC-C when you have it, Jeff.

> On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
>
>
>
> On 2025/11/04 22:32:08 Josh McKenzie wrote:
>>
>> So I guess what I'm noodling on here is a superset of what
Patrick is w/a slight modification, where we double down on CQL
as being the "low level high performance" API for C*, and have
SQL and other APIs built on top of that.
>>
>
> Again from
https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
>
>> Or is it building a native SQL implementation stateless on
top of a backing ordered (ByteOrderedPartitioner),
transactional (accord), key-value cassandra cluster ? It’s an
extra hop, but trying to adjust the existing grammar / DDL to
fit into a language it always mimicked but never implemented
faithfully feels like a bumpy road, where there are many
successful existence proofs for building it stateless a layer
above.
>
> TiKV / TiDB, FoundationDB, etc, etc, etc.
>
> If you have a transactional, performant, ordered KV store,
you can built almos

Re: [DISCUSS] SQL support in Cassandra

2025-11-07 Thread Jordan West
A bit late to this convo but I generally support the POV Joey and Chris
shared. I think SQL can be interesting as a separate layer.

Stepping back I think there is a larger conversation: the initial email
implicitly positions Cassandra’s succcess as trying to compete directly
with Aurora/CRDB/etc on ease of adoption. I’m not personally sure that’s
the best long term strategy (but I think even that warrants a separate
discussion so I’ll pause here). I know this is one component of a larger
vision but it might be prudent to align on where the community wants to
position the database before we talk about how to get there or we risk
building a hodge-podge that isn’t great at anything.

Jordan

On Wed, Nov 5, 2025 at 10:48 Patrick McFadin  wrote:

> I agree on splitting this up. I'll do that today.
>
> Patrick
>
> On Wed, Nov 5, 2025 at 10:44 AM Dinesh Joshi  wrote:
>
>> There are two distinct conversations in this thread.
>>
>> 1. What does the evolution of CQL Syntax look like?
>> 2. What is the path to bring SQL to Cassandra?
>>
>> I suggest we fork 2 discuss threads to have a focused discussion on each
>> topic.
>>
>> Thanks,
>>
>> Dinesh
>>
>> On Wed, Nov 5, 2025 at 10:29 AM David Capwell  wrote:
>>
>>> My personal stance is that new work should look at existing syntax and
>>> ask the question “why are we different”, if the answer is “I prefer this”
>>> or “I didn’t have the time”, I want to push back against this and argue for
>>> SQL / Postgres w/e possible.  If the answer is “correctness” or
>>> “performance” I am far more open to do things our own way.
>>>
>>> Given the above, I don’t like having a requirement we must be SQL /
>>> Postgres compliant, but I do think its a good guide post to keep in mind
>>> when we are doing something new.
>>>
>>> I worry that we already struggle to implement the current surface area
>>> of CQL correctly and in a way that scales safely.
>>>
>>>
>>> This has been a big issue for me over the past few years, when we
>>> implement features correctness / semantics have not historically been given
>>> the thought I feel that they deserve; we have so many weird behaviors that
>>> leak into user land (batch / CAS failures come to mind as they are
>>> constantly making me sad… why is the “short” type variable length? WHY DO
>>> WE HAVE MEANINGLESS EMPTYNESS); we have gotten much better over the
>>> years though… not all negative here =)
>>>
>>> SQL has been building its surface area for decades and trying to catch
>>> up is a significant effort and how to make things correct and performant
>>> becomes an issue.  In the latest spec there is now support for graph
>>> queries, so signing up to be compatible means we need to implement the below
>>>
>>> SELECT *
>>> FROM GRAPH_TABLE(my_graph
>>> MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
>>> 'Alice')
>>> WHERE a.name = 'Mary'
>>> COLUMNS (a.name AS person_a, b.name AS person_b)
>>> );
>>>
>>> That above example is is just a simple example, it gets far more complex
>>> and would be harder for C* to support.
>>>
>>>
>>> I would be curious to see a gap analysis between CQL and SQL
>>> that include the differences in behaviors. I suspect that it will bring a
>>> few surprises and provide some more solid foundation to this discussion.
>>>
>>>
>>> I think this is a good starting point.  There are some nice things in
>>> SQL missing in C* that could be implemented without a ton of risk, and
>>> opening up the discussion around these areas makes sense to me.
>>>
>>> Off the top of my head, here are basic queries that work in SQL but not
>>> CQL, and there is very low levels of risk to support.
>>>
>>> SELECT 1 — simple query to test if the connection is still live
>>>
>>> SELECT func(42) FROM system.peers; — this has lead someone I know to
>>> have to implement functions that return constants specifically to work
>>> around this limitation…
>>>
>>>
>>>
>>> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa  wrote:
>>>
>>> CQL just to demonstrate it’s possible
>>>
>>> Fat node style would indeed be faster but im mostly proving that its
>>> functional
>>>
>>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:
>>>
>>> 
>>> I very much like Jeff, Josh et al.'s proposals around the pluggable
>>> stateless API layer. Also I agree with Chris I would prefer a simpler API
>>> not a more complex one for our applications to couple to e.g. the Java
>>> stdlib. This also sets up a really nice path where the community members
>>> can build the layers that make sense first out-of-tree, and as a project we
>>> can choose the successful ones to bring in-tree. Whichever API those layers
>>> couple to would be a new semi-public interface though which has to be
>>> weighed.
>>>
>>> Jeff I am curious, in that prototype you are hacking are you interacting
>>> directly with the internode protocol and verb system or going through CQL?
>>> I imagine there could be some strengths to going straight to the internode?
>>>
>>> -Joey
>>>
>>> On Tue

Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread Patrick McFadin
I agree on splitting this up. I'll do that today.

Patrick

On Wed, Nov 5, 2025 at 10:44 AM Dinesh Joshi  wrote:

> There are two distinct conversations in this thread.
>
> 1. What does the evolution of CQL Syntax look like?
> 2. What is the path to bring SQL to Cassandra?
>
> I suggest we fork 2 discuss threads to have a focused discussion on each
> topic.
>
> Thanks,
>
> Dinesh
>
> On Wed, Nov 5, 2025 at 10:29 AM David Capwell  wrote:
>
>> My personal stance is that new work should look at existing syntax and
>> ask the question “why are we different”, if the answer is “I prefer this”
>> or “I didn’t have the time”, I want to push back against this and argue for
>> SQL / Postgres w/e possible.  If the answer is “correctness” or
>> “performance” I am far more open to do things our own way.
>>
>> Given the above, I don’t like having a requirement we must be SQL /
>> Postgres compliant, but I do think its a good guide post to keep in mind
>> when we are doing something new.
>>
>> I worry that we already struggle to implement the current surface area of
>> CQL correctly and in a way that scales safely.
>>
>>
>> This has been a big issue for me over the past few years, when we
>> implement features correctness / semantics have not historically been given
>> the thought I feel that they deserve; we have so many weird behaviors that
>> leak into user land (batch / CAS failures come to mind as they are
>> constantly making me sad… why is the “short” type variable length? WHY DO
>> WE HAVE MEANINGLESS EMPTYNESS); we have gotten much better over the
>> years though… not all negative here =)
>>
>> SQL has been building its surface area for decades and trying to catch up
>> is a significant effort and how to make things correct and performant
>> becomes an issue.  In the latest spec there is now support for graph
>> queries, so signing up to be compatible means we need to implement the below
>>
>> SELECT *
>> FROM GRAPH_TABLE(my_graph
>> MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
>> 'Alice')
>> WHERE a.name = 'Mary'
>> COLUMNS (a.name AS person_a, b.name AS person_b)
>> );
>>
>> That above example is is just a simple example, it gets far more complex
>> and would be harder for C* to support.
>>
>>
>> I would be curious to see a gap analysis between CQL and SQL that include
>> the differences in behaviors. I suspect that it will bring a few surprises
>> and provide some more solid foundation to this discussion.
>>
>>
>> I think this is a good starting point.  There are some nice things in SQL
>> missing in C* that could be implemented without a ton of risk, and opening
>> up the discussion around these areas makes sense to me.
>>
>> Off the top of my head, here are basic queries that work in SQL but not
>> CQL, and there is very low levels of risk to support.
>>
>> SELECT 1 — simple query to test if the connection is still live
>>
>> SELECT func(42) FROM system.peers; — this has lead someone I know to have
>> to implement functions that return constants specifically to work around
>> this limitation…
>>
>>
>>
>> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa  wrote:
>>
>> CQL just to demonstrate it’s possible
>>
>> Fat node style would indeed be faster but im mostly proving that its
>> functional
>>
>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:
>>
>> 
>> I very much like Jeff, Josh et al.'s proposals around the pluggable
>> stateless API layer. Also I agree with Chris I would prefer a simpler API
>> not a more complex one for our applications to couple to e.g. the Java
>> stdlib. This also sets up a really nice path where the community members
>> can build the layers that make sense first out-of-tree, and as a project we
>> can choose the successful ones to bring in-tree. Whichever API those layers
>> couple to would be a new semi-public interface though which has to be
>> weighed.
>>
>> Jeff I am curious, in that prototype you are hacking are you interacting
>> directly with the internode protocol and verb system or going through CQL?
>> I imagine there could be some strengths to going straight to the internode?
>>
>> -Joey
>>
>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie 
>> wrote:
>>
>>> Again from
>>>
>>> Right. I'm just zooming out a bit more and applying that same logical
>>> pattern broadly to other API language domains, not just SQL. But yes - your
>>> point definitely stands.
>>>
>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>>
>>> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
>>> to see where this could go. If we use a well established library like
>>> Calcite, then there is no API to maintain. We might find parts of Cassandra
>>> along the way we could alter to make it easier to integrate, but so far
>>> that’s just a premature optimization.
>>>
>>> Sper interested to see the TPC-C when you have it, Jeff.
>>>
>>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
>>> >
>>> >
>>> >
>>> > On 2025/11/04 22:32:08 Josh McKenzie wr

Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread Dinesh Joshi
There are two distinct conversations in this thread.

1. What does the evolution of CQL Syntax look like?
2. What is the path to bring SQL to Cassandra?

I suggest we fork 2 discuss threads to have a focused discussion on each
topic.

Thanks,

Dinesh

On Wed, Nov 5, 2025 at 10:29 AM David Capwell  wrote:

> My personal stance is that new work should look at existing syntax and ask
> the question “why are we different”, if the answer is “I prefer this” or “I
> didn’t have the time”, I want to push back against this and argue for SQL /
> Postgres w/e possible.  If the answer is “correctness” or “performance” I
> am far more open to do things our own way.
>
> Given the above, I don’t like having a requirement we must be SQL /
> Postgres compliant, but I do think its a good guide post to keep in mind
> when we are doing something new.
>
> I worry that we already struggle to implement the current surface area of
> CQL correctly and in a way that scales safely.
>
>
> This has been a big issue for me over the past few years, when we
> implement features correctness / semantics have not historically been given
> the thought I feel that they deserve; we have so many weird behaviors that
> leak into user land (batch / CAS failures come to mind as they are
> constantly making me sad… why is the “short” type variable length? WHY DO
> WE HAVE MEANINGLESS EMPTYNESS); we have gotten much better over the
> years though… not all negative here =)
>
> SQL has been building its surface area for decades and trying to catch up
> is a significant effort and how to make things correct and performant
> becomes an issue.  In the latest spec there is now support for graph
> queries, so signing up to be compatible means we need to implement the below
>
> SELECT *
> FROM GRAPH_TABLE(my_graph
> MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name =
> 'Alice')
> WHERE a.name = 'Mary'
> COLUMNS (a.name AS person_a, b.name AS person_b)
> );
>
> That above example is is just a simple example, it gets far more complex
> and would be harder for C* to support.
>
>
> I would be curious to see a gap analysis between CQL and SQL that include
> the differences in behaviors. I suspect that it will bring a few surprises
> and provide some more solid foundation to this discussion.
>
>
> I think this is a good starting point.  There are some nice things in SQL
> missing in C* that could be implemented without a ton of risk, and opening
> up the discussion around these areas makes sense to me.
>
> Off the top of my head, here are basic queries that work in SQL but not
> CQL, and there is very low levels of risk to support.
>
> SELECT 1 — simple query to test if the connection is still live
>
> SELECT func(42) FROM system.peers; — this has lead someone I know to have
> to implement functions that return constants specifically to work around
> this limitation…
>
>
>
> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa  wrote:
>
> CQL just to demonstrate it’s possible
>
> Fat node style would indeed be faster but im mostly proving that its
> functional
>
> On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:
>
> 
> I very much like Jeff, Josh et al.'s proposals around the pluggable
> stateless API layer. Also I agree with Chris I would prefer a simpler API
> not a more complex one for our applications to couple to e.g. the Java
> stdlib. This also sets up a really nice path where the community members
> can build the layers that make sense first out-of-tree, and as a project we
> can choose the successful ones to bring in-tree. Whichever API those layers
> couple to would be a new semi-public interface though which has to be
> weighed.
>
> Jeff I am curious, in that prototype you are hacking are you interacting
> directly with the internode protocol and verb system or going through CQL?
> I imagine there could be some strengths to going straight to the internode?
>
> -Joey
>
> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie  wrote:
>
>> Again from
>>
>> Right. I'm just zooming out a bit more and applying that same logical
>> pattern broadly to other API language domains, not just SQL. But yes - your
>> point definitely stands.
>>
>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>>
>> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
>> to see where this could go. If we use a well established library like
>> Calcite, then there is no API to maintain. We might find parts of Cassandra
>> along the way we could alter to make it easier to integrate, but so far
>> that’s just a premature optimization.
>>
>> Sper interested to see the TPC-C when you have it, Jeff.
>>
>> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
>> >
>> >
>> >
>> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
>> >>
>> >> So I guess what I'm noodling on here is a superset of what Patrick is
>> w/a slight modification, where we double down on CQL as being the "low
>> level high performance" API for C*, and have SQL and other APIs built on
>> top of

Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread David Capwell
My personal stance is that new work should look at existing syntax and ask the 
question “why are we different”, if the answer is “I prefer this” or “I didn’t 
have the time”, I want to push back against this and argue for SQL / Postgres 
w/e possible.  If the answer is “correctness” or “performance” I am far more 
open to do things our own way.

Given the above, I don’t like having a requirement we must be SQL / Postgres 
compliant, but I do think its a good guide post to keep in mind when we are 
doing something new.

> I worry that we already struggle to implement the current surface area of CQL 
> correctly and in a way that scales safely.

This has been a big issue for me over the past few years, when we implement 
features correctness / semantics have not historically been given the thought I 
feel that they deserve; we have so many weird behaviors that leak into user 
land (batch / CAS failures come to mind as they are constantly making me sad… 
why is the “short” type variable length? WHY DO WE HAVE MEANINGLESS 
EMPTYNESS); we have gotten much better over the years though… not all 
negative here =)

SQL has been building its surface area for decades and trying to catch up is a 
significant effort and how to make things correct and performant becomes an 
issue.  In the latest spec there is now support for graph queries, so signing 
up to be compatible means we need to implement the below

SELECT *
FROM GRAPH_TABLE(my_graph
MATCH (a IS person)-[e IS friends]->(b IS person WHERE b.name = 'Alice')
WHERE a.name = 'Mary'
COLUMNS (a.name AS person_a, b.name AS person_b)
);

That above example is is just a simple example, it gets far more complex and 
would be harder for C* to support.


> I would be curious to see a gap analysis between CQL and SQL that include the 
> differences in behaviors. I suspect that it will bring a few surprises and 
> provide some more solid foundation to this discussion.

I think this is a good starting point.  There are some nice things in SQL 
missing in C* that could be implemented without a ton of risk, and opening up 
the discussion around these areas makes sense to me.

Off the top of my head, here are basic queries that work in SQL but not CQL, 
and there is very low levels of risk to support.

SELECT 1 — simple query to test if the connection is still live

SELECT func(42) FROM system.peers; — this has lead someone I know to have to 
implement functions that return constants specifically to work around this 
limitation…



> On Nov 5, 2025, at 9:15 AM, Jeff Jirsa  wrote:
> 
> CQL just to demonstrate it’s possible
> 
> Fat node style would indeed be faster but im mostly proving that its 
> functional
> 
>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:
>> 
>> 
>> I very much like Jeff, Josh et al.'s proposals around the pluggable 
>> stateless API layer. Also I agree with Chris I would prefer a simpler API 
>> not a more complex one for our applications to couple to e.g. the Java 
>> stdlib. This also sets up a really nice path where the community members can 
>> build the layers that make sense first out-of-tree, and as a project we can 
>> choose the successful ones to bring in-tree. Whichever API those layers 
>> couple to would be a new semi-public interface though which has to be 
>> weighed.
>> 
>> Jeff I am curious, in that prototype you are hacking are you interacting 
>> directly with the internode protocol and verb system or going through CQL? I 
>> imagine there could be some strengths to going straight to the internode?
>> 
>> -Joey
>> 
>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie > > wrote:
 Again from
>>> Right. I'm just zooming out a bit more and applying that same logical 
>>> pattern broadly to other API language domains, not just SQL. But yes - your 
>>> point definitely stands.
>>> 
>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
 I’m grooving on what “Cloud Native Jeff” is saying here and I would like 
 to see where this could go. If we use a well established library like 
 Calcite, then there is no API to maintain. We might find parts of 
 Cassandra along the way we could alter to make it easier to integrate, but 
 so far that’s just a premature optimization.
 
 Sper interested to see the TPC-C when you have it, Jeff. 
 
 > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa >>> > > wrote:
 > 
 > 
 > 
 > On 2025/11/04 22:32:08 Josh McKenzie wrote:
 >> 
 >> So I guess what I'm noodling on here is a superset of what Patrick is 
 >> w/a slight modification, where we double down on CQL as being the "low 
 >> level high performance" API for C*, and have SQL and other APIs built 
 >> on top of that.
 >> 
 > 
 > Again from 
 > https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
 > 
 >> Or is it building a native SQL implementation stateless on top of a 
 >> backin

Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread Josh McKenzie
My intuition is that we would still compare quite favorably to almost all other 
API ecosystems even w/a routing layer gap, our core strength being a clean, 
horizontally scaling, partitioned data store. If we introduced a 1-3ms overhead 
(shooting for an artificially high # here) from using API -> CQL in an API 
gateway hop for instance (instead of writing to internode + verbs), but could 
still scale horizontally and have the durability characteristics we have now, 
the benefit from having that functionality loosely coupled might justify the 
cost. We could probably have a layer like that be much, *much* faster than 
1-3ms w/a more modern runtime env (even just newer JDK and clean-room impl).

And for the record, we could offer API -> CQL -> Storage Engine from an 
endpoint on C* nodes; no absolute need for another layer hop. Just my first 
thought was "What kind of performance expectations do people used to JSON, SQL, 
REST, GraphQL have?".

Alternatively, formalizing "internode + verb" as an API with a documented 
expectation for those "fat clients" in the ecosystem to be consumers of that 
internode lower level API instead (which is the follow-on question that popped 
up for me from your question Joey). So instead of [API -> CQL -> Storage 
Engine] it'd be [API -> Storage Engine], and CQL would be one implementation 
among many. We'd take a hit on flexibility in how we architect and work at that 
internode + verb layer if it became a public API w/all the evolution and 
deprecation pains that come from that, but I don't *think* we've been super 
mutable in that space over the years.

Super interesting work Jeff. I'm sure I'm one person among many who are now 
*super curious* about what you've been working on.

On Wed, Nov 5, 2025, at 12:15 PM, Jeff Jirsa wrote:
> 
> CQL just to demonstrate it’s possible
> 
> Fat node style would indeed be faster but im mostly proving that its 
> functional
> 
>> On Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:
>> 
>> I very much like Jeff, Josh et al.'s proposals around the pluggable 
>> stateless API layer. Also I agree with Chris I would prefer a simpler API 
>> not a more complex one for our applications to couple to e.g. the Java 
>> stdlib. This also sets up a really nice path where the community members can 
>> build the layers that make sense first out-of-tree, and as a project we can 
>> choose the successful ones to bring in-tree. Whichever API those layers 
>> couple to would be a new semi-public interface though which has to be 
>> weighed.
>> 
>> Jeff I am curious, in that prototype you are hacking are you interacting 
>> directly with the internode protocol and verb system or going through CQL? I 
>> imagine there could be some strengths to going straight to the internode?
>> 
>> -Joey
>> 
>> On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie  wrote:
>>> __
 Again from
>>> Right. I'm just zooming out a bit more and applying that same logical 
>>> pattern broadly to other API language domains, not just SQL. But yes - your 
>>> point definitely stands.
>>> 
>>> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
 I’m grooving on what “Cloud Native Jeff” is saying here and I would like 
 to see where this could go. If we use a well established library like 
 Calcite, then there is no API to maintain. We might find parts of 
 Cassandra along the way we could alter to make it easier to integrate, but 
 so far that’s just a premature optimization.
 
 Sper interested to see the TPC-C when you have it, Jeff. 
 
 > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
 > 
 > 
 > 
 > On 2025/11/04 22:32:08 Josh McKenzie wrote:
 >> 
 >> So I guess what I'm noodling on here is a superset of what Patrick is 
 >> w/a slight modification, where we double down on CQL as being the "low 
 >> level high performance" API for C*, and have SQL and other APIs built 
 >> on top of that.
 >> 
 > 
 > Again from 
 > https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
 > 
 >> Or is it building a native SQL implementation stateless on top of a 
 >> backing ordered (ByteOrderedPartitioner), transactional (accord), 
 >> key-value cassandra cluster ? It’s an extra hop, but trying to adjust 
 >> the existing grammar / DDL to fit into a language it always mimicked 
 >> but never implemented faithfully feels like a bumpy road, where there 
 >> are many successful existence proofs for building it stateless a layer 
 >> above.
 > 
 > TiKV / TiDB, FoundationDB, etc, etc, etc.
 > 
 > If you have a transactional, performant, ordered KV store, you can built 
 > almost any high level database on top of it. You can expose even lower 
 > layer primitives (like placement) to optimize for it.
 
 
>>> 


Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread Jeff Jirsa
CQL just to demonstrate it’s possibleFat node style would indeed be faster but im mostly proving that its functionalOn Nov 5, 2025, at 8:55 AM, Joseph Lynch  wrote:I very much like Jeff, Josh et al.'s proposals around the pluggable stateless API layer. Also I agree with Chris I would prefer a simpler API not a more complex one for our applications to couple to e.g. the Java stdlib. This also sets up a really nice path where the community members can build the layers that make sense first out-of-tree, and as a project we can choose the successful ones to bring in-tree. Whichever API those layers couple to would be a new semi-public interface though which has to be weighed.Jeff I am curious, in that prototype you are hacking are you interacting directly with the internode protocol and verb system or going through CQL? I imagine there could be some strengths to going straight to the internode?-JoeyOn Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie  wrote:Again fromRight. I'm just zooming out a bit more and applying that same logical pattern broadly to other API language domains, not just SQL. But yes - your point definitely stands.On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:I’m grooving on what “Cloud Native Jeff” is saying here and I would like to see where this could go. If we use a well established library like Calcite, then there is no API to maintain. We might find parts of Cassandra along the way we could alter to make it easier to integrate, but so far that’s just a premature optimization.Sper interested to see the TPC-C when you have it, Jeff. > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:> > > > On 2025/11/04 22:32:08 Josh McKenzie wrote:>> >> So I guess what I'm noodling on here is a superset of what Patrick is w/a slight modification, where we double down on CQL as being the "low level high performance" API for C*, and have SQL and other APIs built on top of that.>> > > Again from https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50> >> Or is it building a native SQL implementation stateless on top of a backing ordered (ByteOrderedPartitioner), transactional (accord), key-value cassandra cluster ? It’s an extra hop, but trying to adjust the existing grammar / DDL to fit into a language it always mimicked but never implemented faithfully feels like a bumpy road, where there are many successful existence proofs for building it stateless a layer above.> > TiKV / TiDB, FoundationDB, etc, etc, etc.> > If you have a transactional, performant, ordered KV store, you can built almost any high level database on top of it. You can expose even lower layer primitives (like placement) to optimize for it.


Re: [DISCUSS] SQL support in Cassandra

2025-11-05 Thread Joseph Lynch
I very much like Jeff, Josh et al.'s proposals around the pluggable
stateless API layer. Also I agree with Chris I would prefer a simpler API
not a more complex one for our applications to couple to e.g. the Java
stdlib. This also sets up a really nice path where the community members
can build the layers that make sense first out-of-tree, and as a project we
can choose the successful ones to bring in-tree. Whichever API those layers
couple to would be a new semi-public interface though which has to be
weighed.

Jeff I am curious, in that prototype you are hacking are you interacting
directly with the internode protocol and verb system or going through CQL?
I imagine there could be some strengths to going straight to the internode?

-Joey

On Tue, Nov 4, 2025 at 3:49 PM Josh McKenzie  wrote:

> Again from
>
> Right. I'm just zooming out a bit more and applying that same logical
> pattern broadly to other API language domains, not just SQL. But yes - your
> point definitely stands.
>
> On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
>
> I’m grooving on what “Cloud Native Jeff” is saying here and I would like
> to see where this could go. If we use a well established library like
> Calcite, then there is no API to maintain. We might find parts of Cassandra
> along the way we could alter to make it easier to integrate, but so far
> that’s just a premature optimization.
>
> Sper interested to see the TPC-C when you have it, Jeff.
>
> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
> >
> >
> >
> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
> >>
> >> So I guess what I'm noodling on here is a superset of what Patrick is
> w/a slight modification, where we double down on CQL as being the "low
> level high performance" API for C*, and have SQL and other APIs built on
> top of that.
> >>
> >
> > Again from
> https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
> >
> >> Or is it building a native SQL implementation stateless on top of a
> backing ordered (ByteOrderedPartitioner), transactional (accord), key-value
> cassandra cluster ? It’s an extra hop, but trying to adjust the existing
> grammar / DDL to fit into a language it always mimicked but never
> implemented faithfully feels like a bumpy road, where there are many
> successful existence proofs for building it stateless a layer above.
> >
> > TiKV / TiDB, FoundationDB, etc, etc, etc.
> >
> > If you have a transactional, performant, ordered KV store, you can built
> almost any high level database on top of it. You can expose even lower
> layer primitives (like placement) to optimize for it.
>
>
>
>


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Josh McKenzie
> Again from
Right. I'm just zooming out a bit more and applying that same logical pattern 
broadly to other API language domains, not just SQL. But yes - your point 
definitely stands.

On Tue, Nov 4, 2025, at 6:42 PM, Patrick McFadin wrote:
> I’m grooving on what “Cloud Native Jeff” is saying here and I would like to 
> see where this could go. If we use a well established library like Calcite, 
> then there is no API to maintain. We might find parts of Cassandra along the 
> way we could alter to make it easier to integrate, but so far that’s just a 
> premature optimization.
> 
> Sper interested to see the TPC-C when you have it, Jeff. 
> 
> > On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
> > 
> > 
> > 
> > On 2025/11/04 22:32:08 Josh McKenzie wrote:
> >> 
> >> So I guess what I'm noodling on here is a superset of what Patrick is w/a 
> >> slight modification, where we double down on CQL as being the "low level 
> >> high performance" API for C*, and have SQL and other APIs built on top of 
> >> that.
> >> 
> > 
> > Again from https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
> > 
> >> Or is it building a native SQL implementation stateless on top of a 
> >> backing ordered (ByteOrderedPartitioner), transactional (accord), 
> >> key-value cassandra cluster ? It’s an extra hop, but trying to adjust the 
> >> existing grammar / DDL to fit into a language it always mimicked but never 
> >> implemented faithfully feels like a bumpy road, where there are many 
> >> successful existence proofs for building it stateless a layer above.
> > 
> > TiKV / TiDB, FoundationDB, etc, etc, etc.
> > 
> > If you have a transactional, performant, ordered KV store, you can built 
> > almost any high level database on top of it. You can expose even lower 
> > layer primitives (like placement) to optimize for it.
> 
> 


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Patrick McFadin
I’m grooving on what “Cloud Native Jeff” is saying here and I would like to see 
where this could go. If we use a well established library like Calcite, then 
there is no API to maintain. We might find parts of Cassandra along the way we 
could alter to make it easier to integrate, but so far that’s just a premature 
optimization.

Sper interested to see the TPC-C when you have it, Jeff. 

> On Nov 4, 2025, at 3:25 PM, Jeff Jirsa  wrote:
> 
> 
> 
> On 2025/11/04 22:32:08 Josh McKenzie wrote:
>> 
>> So I guess what I'm noodling on here is a superset of what Patrick is w/a 
>> slight modification, where we double down on CQL as being the "low level 
>> high performance" API for C*, and have SQL and other APIs built on top of 
>> that.
>> 
> 
> Again from https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50
> 
>> Or is it building a native SQL implementation stateless on top of a backing 
>> ordered (ByteOrderedPartitioner), transactional (accord), key-value 
>> cassandra cluster ? It’s an extra hop, but trying to adjust the existing 
>> grammar / DDL to fit into a language it always mimicked but never 
>> implemented faithfully feels like a bumpy road, where there are many 
>> successful existence proofs for building it stateless a layer above.
> 
> TiKV / TiDB, FoundationDB, etc, etc, etc.
> 
> If you have a transactional, performant, ordered KV store, you can built 
> almost any high level database on top of it. You can expose even lower layer 
> primitives (like placement) to optimize for it.



Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Jeff Jirsa



On 2025/11/04 22:32:08 Josh McKenzie wrote:
> 
> So I guess what I'm noodling on here is a superset of what Patrick is w/a 
> slight modification, where we double down on CQL as being the "low level high 
> performance" API for C*, and have SQL and other APIs built on top of that.
> 

Again from https://lists.apache.org/thread/hdwf0g7pnnko7m84yxn87lybnlcdvn50

>  Or is it building a native SQL implementation stateless on top of a backing 
> ordered (ByteOrderedPartitioner), transactional (accord), key-value cassandra 
> cluster ? It’s an extra hop, but trying to adjust the existing grammar / DDL 
> to fit into a language it always mimicked but never implemented faithfully 
> feels like a bumpy road, where there are many successful existence proofs for 
> building it stateless a layer above.

TiKV / TiDB, FoundationDB, etc, etc, etc.

If you have a transactional, performant, ordered KV store, you can built almost 
any high level database on top of it. You can expose even lower layer 
primitives (like placement) to optimize for it.


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Josh McKenzie
> conforming to one of the "standard" data APIs could make more people more 
> comfortable with staking their future on Cassandra
Aligning with a standard data API could make people more comfortable staking 
their future, and deprecating a 2nd primary query language would *definitely* 
make people less comfortable staking their future on Cassandra too eh? :)

The tension we face here is very real and to your point:
> worry about the long term cost of either maintaining parity across 2+ query 
> languages -- adding multi-language support for each new feature -- or 
> allowing the languages to diverge feature-wise. One leads to a lot of work, 
> the other leads to a lot of confusion
If we try to 1st-class-citizen 2 different APIs we're going to always be 
balancing these 2 poles I think. And to Joey and some other points in the 
thread, we've long struggled to stabilize features that aren't natively good 
bedfellows with a leaderless architecture or our K/K/V wide-row storage engine, 
to the point where we haven't had the resources as a community to get some of 
them production ready to the level we need in years. Given that, I'm wary of us 
trying to take on supporting 2 poles like this long-term.

A future where we 1) had a core rock solid CQL as primitive (language 
reflecting storage engine) and 2) had a translation layer that presented 
different APIs to different users for different use-cases could maybe give us 
the best of both worlds by creating a separation in our engineering efforts 
through architecture. Using Conway's Law for good, as it were.

As that Storage Engine's capability grew (using "Storage Engine" kind of 
broadly here to encapsulate distributed query coordination too) with something 
like Accord, CQL itself would evolve and then the things we built on top of 
that could themselves then have richer API surface areas and better conformance 
with the totality of their respective language features. I'd think of this 
proposed model as "CQL is the deeper API layer to talk directly to C* and other 
language ecosystem support can be built on top of CQL".

For any new language features in CQL going forward, defaulting towards 
SQL-compliance seems the logical choice rather than diverging further, all else 
being equal.

Having a GraphQL API that supported the aspects of GraphQL that played nicely 
with CQL, or JSON, or ANSI SQL, each with their own specific restrictions or 
subset of implementation *seems* like it could give us the best of both worlds. 
I'm inclined to think that a layered architecture like that would be the best 
approach for both our users and ourselves maintaining the project, as well as 
growing the ecosystem. It would also be much easier to on-ramp as a contributor 
to work on a SQL implementation on top of CQL or a document API on top of it 
vs. needing to ramp on a less modularized, more coupled base single API like 
people have to do today.

So I guess what I'm noodling on here is a superset of what Patrick is w/a 
slight modification, where we double down on CQL as being the "low level high 
performance" API for C*, and have SQL and other APIs built on top of that.

On Tue, Nov 4, 2025, at 3:39 PM, Joel Shepherd wrote:
> On 11/4/2025 11:42 AM, Patrick McFadin wrote:
> >
> > Just to be clear, in my initial proposal I said that CQL can never go 
> > away. It’s a life sentence. Knowing the upgrade cycle that many users 
> > are on, it will be 50 years before we could even try.
> 
> Fifty years will go by like that. :-)
> 
> I'd mostly worry about the long term cost of either maintaining parity 
> across 2+ query languages -- adding multi-language support for each new 
> feature -- or allowing the languages to diverge feature-wise. One leads 
> to a lot of work, the other leads to a lot of confusion.
> 
> Thanks -- Joel.
> 
> 
> 


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Joel Shepherd

On 11/4/2025 11:42 AM, Patrick McFadin wrote:


Just to be clear, in my initial proposal I said that CQL can never go 
away. It’s a life sentence. Knowing the upgrade cycle that many users 
are on, it will be 50 years before we could even try.


Fifty years will go by like that. :-)

I'd mostly worry about the long term cost of either maintaining parity 
across 2+ query languages -- adding multi-language support for each new 
feature -- or allowing the languages to diverge feature-wise. One leads 
to a lot of work, the other leads to a lot of confusion.


Thanks -- Joel.




Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Joel Shepherd

On 11/3/2025 10:38 PM, Mick wrote:

On 3 Nov 2025, at 20:32, Joel Shepherd wrote:

At the same time, my personal opinion is that if SQL compatibility is pursued, 
then the end game should be to deprecate CQL. That will probably take years, 
but at the limit I don't see a lot of benefit to supporting both.

We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is obvious, 
but it is a very broad question.

The adoption and standardisation benefits are obvious, but CQL has strengths 
relative to SQL in Cassandra’s context.


IMO this is the crux of the debate. If Patrick's hypothesis (from his 
CoC talk IIRC) that there is a consolidation underway in the 
database/storage world, including API consolidation, is correct, then 
conforming to one of the "standard" data APIs could make more people 
more comfortable with staking their future on Cassandra.


But if that requires building unsatisfying features in Cassandra (joins 
with meh performance, "weird" transaction semantics for people coming 
from an RDBMS background, etc.), or makes it harder to use existing 
Cassandra functionality, then there is real risk of diluting Cassandra's 
strengths and harming its reputation.



One is Cassandra’s wide-partition model with flexible clustering columns, which 
supports very large, ordered partitions (e.g. time-series and efficient range 
scans), rather than a strictly normalised, join-centric model. These patterns 
don’t always map cleanly to SQL semantics, and CQL’s query-driven, 
table-per-query modelling helps move users toward designs that scale 
predictably.

I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, 
while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ framing, e.g. 
eventually default to a SQL interface (with Accord) for the broadest UX, while 
CQL remains a high-throughput path.


This is where I could use some education. What are a couple examples of 
things that make CQL better suited for high throughput than SQL? Some of 
the key differences that I can see are:


* CQL is consensus-aware; SQL isn't.
* CQL is partition and cluster key aware; SQL has a single primary key 
concept.

* CQL discourages cross- or multi-partition operations; SQL doesn't.
* CQL doesn't support relational joins, referential integrity, etc 
(cross-partition and cross-table operations); SQL does.


Are there others that'd be good to think about?

Syntactically, the partition key vs primary key difference and 
consensus-awareness seem like the hardest to deal with in SQL: I'm not 
sure how to do it and stay conformant (not introduce Cassandra-specific 
syntax). The other two, I think, could be addressed either by not 
offering support (SQL w/o join syntax, ref integrity syntax, etc.) or by 
offering constrained support (e.g. you can join but at least one side of 
the join must be constrained to a single partition).


Would love to learn if there are other throughput-related nuances to CQL 
that wouldn't translate easily to SQL.




In the spirit of "respect what came before", I'm asking the next 
question not to throw shade on CQL or its creators, but to explore 
doubling-down on CQL.


If CQL didn't try as hard to look like SQL, could it be a better API for 
Cassandra? For example, if the syntax required you to specify partition 
key constraints explicitly, just like you have to be explicit with 
"ALLOW FILTERING" today, could CQL become a more optimal language for 
Cassandra?


If someone is going to the trouble of building a Cassandra front-end to 
support a different query language to make Cassandra more appealing (I 
think that's ultimately the goal), would it be better to deprioritize 
SQL conformance and instead design a language specifically for 
wide-partition, high throughput, distributed, eventually consistent 
databases?



That doesn't make me opposed to the endeavour of SQL compatibility, it pushes 
me on the why question a bit more for alignment clarity to our strengths.

Definitely agree that both approaches are worth consideration.

Thanks -- Joel.


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Patrick McFadin
Yes and I’m not asking for wholesale conversion. The few time we need to add 
new syntax, we default to established SQL syntax. Example where this was 
already done was in CEP-52

> On Nov 4, 2025, at 11:56 AM, Jeff Jirsa  wrote:
> 
> Conversion doesn't seem practical or desirable: it'd probably result in  CQL 
> and a muddled version of SQL which wouldn't be beneficial for anyone.



Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Josh McKenzie
> It could also be that nobody's willing to actually do it for real, in which 
> case it's all talk and there's no decision to make. 
I know of at least 2 groups running thousands of C* clusters w/projects in 
front of them exposing different APIs than CQL so there's definitely people 
willing to do it for real. Now, those being subprojects and subject to the dev 
cycles and community ergonomics of an Apache Foundation project are a different 
story. :)

I think a recurring SIG to explore that (or even just a 
one-off-advertised-here-higher-bandwidth community discussion about it we bring 
back to the list) would be super interesting and I'd be willing to donate an 
hour of my life to that.

These patterns of things people do redundantly in orbit around our core project 
always capture my imagination. /shrug

On Tue, Nov 4, 2025, at 2:56 PM, Jeff Jirsa wrote:
> 
> 
> On 2025/11/04 19:42:31 Patrick McFadin wrote:
> > Just to be clear, in my initial proposal I said that CQL can never go away. 
> > It’s a life sentence. Knowing the upgrade cycle that many users are on, it 
> > will be 50 years before we could even try. 
> > 
> > I feel we are at a fork here in the discussion. 
> > 
> > Fork 1: Discuss and somehow ratify that we adhere to SQL syntax for new CQL 
> > features 
> > 
> > Fork 2: Formation of a SIG or new DISCUSS thread on how to add SQL as a 
> > formal path. There have already been throwing around really good ideas and 
> > should continue. Josh wrapped it up nicely with a “Stateless layer that 
> > could serve many purposes” 
> 
> 
> Joel said this a while back: 
> 
> Conversion doesn't seem practical or desirable: it'd probably result in  CQL 
> and a muddled version of SQL which wouldn't be beneficial for anyone.
> 
> I agree with that, and I'd vote accordingly if someone wanted to ratify your 
> fork 1.
> 
> Your fork 2 may or may not need to exist. You don't strictly NEED a SIG, 
> because you don't need changes in the database to implement it. It could be a 
> subproject. It could also be an external project. It could also be that 
> nobody's willing to actually do it for real, in which case it's all talk and 
> there's no decision to make.  I think if someone wants to write a CEP that 
> exposes lower layer primitives to someone building layers, write the CEP and 
> negotiate it, great. If someone wants to write an official project-inclusive 
> CEP for a real SQL layer, write it and negotiate it. If someone wants to do 
> JOINs for CQL, fine, go for it (is that CEP already around? Ir might be). 
> 
> Right now all we have is a proposal to constrain CQL to be SQL compatible, 
> and at least a handful of people have argued about why that's probably a bad 
> idea. 
> 
> 
> 
> 


Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Jeff Jirsa



On 2025/11/04 19:42:31 Patrick McFadin wrote:
> Just to be clear, in my initial proposal I said that CQL can never go away. 
> It’s a life sentence. Knowing the upgrade cycle that many users are on, it 
> will be 50 years before we could even try. 
> 
> I feel we are at a fork here in the discussion. 
> 
> Fork 1: Discuss and somehow ratify that we adhere to SQL syntax for new CQL 
> features 
> 
> Fork 2: Formation of a SIG or new DISCUSS thread on how to add SQL as a 
> formal path. There have already been throwing around really good ideas and 
> should continue. Josh wrapped it up nicely with a “Stateless layer that could 
> serve many purposes” 


Joel said this a while back: 

Conversion doesn't seem practical or desirable: it'd probably result in  CQL 
and a muddled version of SQL which wouldn't be beneficial for anyone.

I agree with that, and I'd vote accordingly if someone wanted to ratify your 
fork 1.

Your fork 2 may or may not need to exist. You don't strictly NEED a SIG, 
because you don't need changes in the database to implement it. It could be a 
subproject. It could also be an external project. It could also be that 
nobody's willing to actually do it for real, in which case it's all talk and 
there's no decision to make.  I think if someone wants to write a CEP that 
exposes lower layer primitives to someone building layers, write the CEP and 
negotiate it, great. If someone wants to write an official project-inclusive 
CEP for a real SQL layer, write it and negotiate it. If someone wants to do 
JOINs for CQL, fine, go for it (is that CEP already around? Ir might be). 

Right now all we have is a proposal to constrain CQL to be SQL compatible, and 
at least a handful of people have argued about why that's probably a bad idea. 





Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Dinesh Joshi
When a user hears PostgreSQL compatibility, the implicit assumption they
have is a full bug-for-bug compatibility with Postgres. I don't think
that's what you mean here, is it?

On Tue, Nov 4, 2025 at 11:38 AM Jeff Jirsa  wrote:

> I started building a Postgres layer to convince myself it’s possible. It’s
> got joins, interactive transactions, mvcc, pg wire protocol, query planner,
> etc. it’s far enough along I can run tpc-c.
>
> The only cassandra change that was needed was a fix to accord for BOP
> variable length tokens serialized in the journal. The rest just works if
> you know how Postgres and Cassandra work.
>
> I’m running tpc-c to see how far from acceptable latency it is for a week
> of toy work but I’m about 95% sure that anyone who knows how databases work
> can implant a Postgres layer on cassandra for real as soon as accord
> launches
>
> I don’t think the project needs to build this into cassandra. There are a
> lot of reasons not to do that.
>
>
>
>
> On Nov 4, 2025, at 11:18 AM, Josh McKenzie  wrote:
>
> 
> Good point Joey; I was rather focused on the ergonomics of implicit
> constraint that come with CQL vs. SQL and the gap we'd have to bridge to
> make a SQL-centric world have the same design language as CQL today.
>
> We can't afford to drop CQL at this point unless we had an overwhelmingly
> bullet-proof CQL->SQL translation layer that didn't introduce new edge
> cases nor performance degradation compared to CQL directly today. Users
> would have to have the ability for existing CQL applications to Just Work
> when migrated onto some new paradigm where the existing CQL native protocol
> endpoints were deprecated. At that point we'd just be weighing the cost of
> maintaining a translation layer between API semantics vs. a translation
> layer between the native protocol and the storage engine we already have
> today; lot of work to just be where we are today IMO.
>
> We've learned the hard way that when you remove functionality from the
> database it hurts a lot of users in a lot of ways and we all discussed and
> broadly had a consensus to try not to remove anything going forward on the
> dev ML in the past year as I recall. Removing our core query language would
> be... quite the opposite of what we discussed and agreed to.
>
> Now - SQL layer on top of the storage engine? If people want to work on
> that I think it'd be great for our ecosystem. To Chris' point, I think
> there's probably appetite from users' perspectives to have different APIs
> to interact with data in the storage engine, be it gRPC, GraphQL, JSON, CQL
> over REST, CQL, SQL, etc. Us having a layer that allowed us to reasonably
> build in that functionality would be a net win.
>
> On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:
>
> Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting
> to to move the other direction towards a grpc endpoint that is even more
> restrictive than cql. This is coming from a standpoint of needing to clean
> up after mistakes (application/modeling etc, not cassandra) than the
> standpoint of trying to sell people on using the database. I would
> prefer to see all the features and endpoints we provide work well without
> breaking than make cool demos and feature bullet points. That said I know
> in order for a database to be successful we need the cool feature sets as
> well.  CQL works for now and deprecating that would be an absolute
> nightmare for people *already* using it (ie thrift migration was not fun
> for anyone). I say create a new entrypoint or layer, mark it experimental
> and allow operators to disable it but leave the existing CQL interface
> alone.
>
> Chris
>
> On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath  wrote:
>
> I share Joey's opinions on this. Many features that resemble SQL (e.g.,
> indexes, materialized views) come with caveats that stem from
> their implementation details rather than the query language itself. If we
> expose these same features through SQL as they are today, I think we'd risk
> setting users up for disappointment, since they will come in with implicit
> expectations about how a given SQL feature should work based on their
> previous experience and more often than not we won't meet that expectation.
> At least with CQL we set the expectation that this is a different database,
> where familiar concepts might behave differently than you would expect.
>
> That said, in terms of a long term direction, I think having SQL support
> is a good guiding light and implementing it as a stateless component as
> Jeff suggests would help make this easier to realize.
>
> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch 
> wrote:
>
> Removing CQL is, in my opinion, completely off the table. When we
> deprecated Thrift and gave CQL as the new query language, we imposed
> significant pain on our existing functional Thrift applications to migrate
> to it - I feel we should not hurt our users like that again.
>
> I worry that we already struggle 

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Patrick McFadin
Maybe a subproject?

> On Nov 4, 2025, at 11:35 AM, Jeff Jirsa  wrote:
> 
> I don’t think the project needs to build this into cassandra. There are a lot 
> of reasons not to do that. 



Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Patrick McFadin
Just to be clear, in my initial proposal I said that CQL can never go away. 
It’s a life sentence. Knowing the upgrade cycle that many users are on, it will 
be 50 years before we could even try. 

I feel we are at a fork here in the discussion. 

Fork 1: Discuss and somehow ratify that we adhere to SQL syntax for new CQL 
features 

Fork 2: Formation of a SIG or new DISCUSS thread on how to add SQL as a formal 
path. There have already been throwing around really good ideas and should 
continue. Josh wrapped it up nicely with a “Stateless layer that could serve 
many purposes” 

That’s my proposal. WDYT?

Patrick

> On Nov 4, 2025, at 11:18 AM, Josh McKenzie  wrote:
> 
> Good point Joey; I was rather focused on the ergonomics of implicit 
> constraint that come with CQL vs. SQL and the gap we'd have to bridge to make 
> a SQL-centric world have the same design language as CQL today.
> 
> We can't afford to drop CQL at this point unless we had an overwhelmingly 
> bullet-proof CQL->SQL translation layer that didn't introduce new edge cases 
> nor performance degradation compared to CQL directly today. Users would have 
> to have the ability for existing CQL applications to Just Work when migrated 
> onto some new paradigm where the existing CQL native protocol endpoints were 
> deprecated. At that point we'd just be weighing the cost of maintaining a 
> translation layer between API semantics vs. a translation layer between the 
> native protocol and the storage engine we already have today; lot of work to 
> just be where we are today IMO.
> 
> We've learned the hard way that when you remove functionality from the 
> database it hurts a lot of users in a lot of ways and we all discussed and 
> broadly had a consensus to try not to remove anything going forward on the 
> dev ML in the past year as I recall. Removing our core query language would 
> be... quite the opposite of what we discussed and agreed to.
> 
> Now - SQL layer on top of the storage engine? If people want to work on that 
> I think it'd be great for our ecosystem. To Chris' point, I think there's 
> probably appetite from users' perspectives to have different APIs to interact 
> with data in the storage engine, be it gRPC, GraphQL, JSON, CQL over REST, 
> CQL, SQL, etc. Us having a layer that allowed us to reasonably build in that 
> functionality would be a net win.
> 
> On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:
>> Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting 
>> to to move the other direction towards a grpc endpoint that is even more 
>> restrictive than cql. This is coming from a standpoint of needing to clean 
>> up after mistakes (application/modeling etc, not cassandra) than the 
>> standpoint of trying to sell people on using the database. I would prefer to 
>> see all the features and endpoints we provide work well without breaking 
>> than make cool demos and feature bullet points. That said I know in order 
>> for a database to be successful we need the cool feature sets as well.  CQL 
>> works for now and deprecating that would be an absolute nightmare for people 
>> already using it (ie thrift migration was not fun for anyone). I say create 
>> a new entrypoint or layer, mark it experimental and allow operators to 
>> disable it but leave the existing CQL interface alone.
>> 
>> Chris
>> 
>> On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath > > wrote:
>> I share Joey's opinions on this. Many features that resemble SQL (e.g., 
>> indexes, materialized views) come with caveats that stem from their 
>> implementation details rather than the query language itself. If we expose 
>> these same features through SQL as they are today, I think we'd risk setting 
>> users up for disappointment, since they will come in with implicit 
>> expectations about how a given SQL feature should work based on their 
>> previous experience and more often than not we won't meet that expectation. 
>> At least with CQL we set the expectation that this is a different database, 
>> where familiar concepts might behave differently than you would expect. 
>> 
>> That said, in terms of a long term direction, I think having SQL support is 
>> a good guiding light and implementing it as a stateless component as Jeff 
>> suggests would help make this easier to realize. 
>> 
>> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch > > wrote:
>> Removing CQL is, in my opinion, completely off the table. When we deprecated 
>> Thrift and gave CQL as the new query language, we imposed significant pain 
>> on our existing functional Thrift applications to migrate to it - I feel we 
>> should not hurt our users like that again.
>> 
>> I worry that we already struggle to implement the current surface area of 
>> CQL correctly and in a way that scales safely. For example, CQL allows us to 
>> create arbitrarily large partitions, but large partitions and large columns 
>> continue to

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Jeff Jirsa
I started building a Postgres layer to convince myself it’s possible. It’s got joins, interactive transactions, mvcc, pg wire protocol, query planner, etc. it’s far enough along I can run tpc-c. The only cassandra change that was needed was a fix to accord for BOP variable length tokens serialized in the journal. The rest just works if you know how Postgres and Cassandra work. I’m running tpc-c to see how far from acceptable latency it is for a week of toy work but I’m about 95% sure that anyone who knows how databases work can implant a Postgres layer on cassandra for real as soon as accord launchesI don’t think the project needs to build this into cassandra. There are a lot of reasons not to do that. On Nov 4, 2025, at 11:18 AM, Josh McKenzie  wrote:Good point Joey; I was rather focused on the ergonomics of implicit constraint that come with CQL vs. SQL and the gap we'd have to bridge to make a SQL-centric world have the same design language as CQL today.We can't afford to drop CQL at this point unless we had an overwhelmingly bullet-proof CQL->SQL translation layer that didn't introduce new edge cases nor performance degradation compared to CQL directly today. Users would have to have the ability for existing CQL applications to Just Work when migrated onto some new paradigm where the existing CQL native protocol endpoints were deprecated. At that point we'd just be weighing the cost of maintaining a translation layer between API semantics vs. a translation layer between the native protocol and the storage engine we already have today; lot of work to just be where we are today IMO.We've learned the hard way that when you remove functionality from the database it hurts a lot of users in a lot of ways and we all discussed and broadly had a consensus to try not to remove anything going forward on the dev ML in the past year as I recall. Removing our core query language would be... quite the opposite of what we discussed and agreed to.Now - SQL layer on top of the storage engine? If people want to work on that I think it'd be great for our ecosystem. To Chris' point, I think there's probably appetite from users' perspectives to have different APIs to interact with data in the storage engine, be it gRPC, GraphQL, JSON, CQL over REST, CQL, SQL, etc. Us having a layer that allowed us to reasonably build in that functionality would be a net win.On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting to to move the other direction towards a grpc endpoint that is even more restrictive than cql. This is coming from a standpoint of needing to clean up after mistakes (application/modeling etc, not cassandra) than the standpoint of trying to sell people on using the database. I would prefer to see all the features and endpoints we provide work well without breaking than make cool demos and feature bullet points. That said I know in order for a database to be successful we need the cool feature sets as well.  CQL works for now and deprecating that would be an absolute nightmare for people already using it (ie thrift migration was not fun for anyone). I say create a new entrypoint or layer, mark it experimental and allow operators to disable it but leave the existing CQL interface alone.ChrisOn Tue, Nov 4, 2025 at 10:53 AM Isaac Reath  wrote:I share Joey's opinions on this. Many features that resemble SQL (e.g., indexes, materialized views) come with caveats that stem from their implementation details rather than the query language itself. If we expose these same features through SQL as they are today, I think we'd risk setting users up for disappointment, since they will come in with implicit expectations about how a given SQL feature should work based on their previous experience and more often than not we won't meet that expectation. At least with CQL we set the expectation that this is a different database, where familiar concepts might behave differently than you would expect. That said, in terms of a long term direction, I think having SQL support is a good guiding light and implementing it as a stateless component as Jeff suggests would help make this easier to realize. On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch  wrote:Removing CQL is, in my opinion, completely off the table. When we deprecated Thrift and gave CQL as the new query language, we imposed significant pain on our existing functional Thrift applications to migrate to it - I feel we should not hurt our users like that again.I worry that we already struggle to implement the current surface area of CQL correctly and in a way that scales safely. For example, CQL allows us to create arbitrarily large partitions, but large partitions and large columns continue to be something our storage engine can't currently handle well. CQL allows us to create secondary indices for improved filter support but few can (or at least we struggle) to safely

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Josh McKenzie
Good point Joey; I was rather focused on the ergonomics of implicit constraint 
that come with CQL vs. SQL and the gap we'd have to bridge to make a 
SQL-centric world have the same design language as CQL today.

We can't afford to drop CQL at this point unless we had an overwhelmingly 
bullet-proof CQL->SQL translation layer that didn't introduce new edge cases 
nor performance degradation compared to CQL directly today. Users would have to 
have the ability for existing CQL applications to Just Work when migrated onto 
some new paradigm where the existing CQL native protocol endpoints were 
deprecated. At that point we'd just be weighing the cost of maintaining a 
translation layer between API semantics vs. a translation layer between the 
native protocol and the storage engine we already have today; lot of work to 
just be where we are today IMO.

We've learned the hard way that when you remove functionality from the database 
it hurts a lot of users in a lot of ways and we all discussed and broadly had a 
consensus to try not to remove anything going forward on the dev ML in the past 
year as I recall. Removing our core query language would be... quite the 
opposite of what we discussed and agreed to.

Now - SQL layer on top of the storage engine? If people want to work on that I 
think it'd be great for our ecosystem. To Chris' point, I think there's 
probably appetite from users' perspectives to have different APIs to interact 
with data in the storage engine, be it gRPC, GraphQL, JSON, CQL over REST, CQL, 
SQL, etc. Us having a layer that allowed us to reasonably build in that 
functionality would be a net win.

On Tue, Nov 4, 2025, at 12:36 PM, Chris Lohfink wrote:
> Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting to 
> to move the other direction towards a grpc endpoint that is even more 
> restrictive than cql. This is coming from a standpoint of needing to clean up 
> after mistakes (application/modeling etc, not cassandra) than the standpoint 
> of trying to sell people on using the database. I would prefer to see all the 
> features and endpoints we provide work well without breaking than make cool 
> demos and feature bullet points. That said I know in order for a database to 
> be successful we need the cool feature sets as well.  CQL works for now and 
> deprecating that would be an absolute nightmare for people *already* using it 
> (ie thrift migration was not fun for anyone). I say create a new entrypoint 
> or layer, mark it experimental and allow operators to disable it but leave 
> the existing CQL interface alone.
> 
> Chris
> 
> On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath  wrote:
>> I share Joey's opinions on this. Many features that resemble SQL (e.g., 
>> indexes, materialized views) come with caveats that stem from their 
>> implementation details rather than the query language itself. If we expose 
>> these same features through SQL as they are today, I think we'd risk setting 
>> users up for disappointment, since they will come in with implicit 
>> expectations about how a given SQL feature should work based on their 
>> previous experience and more often than not we won't meet that expectation. 
>> At least with CQL we set the expectation that this is a different database, 
>> where familiar concepts might behave differently than you would expect. 
>> 
>> That said, in terms of a long term direction, I think having SQL support is 
>> a good guiding light and implementing it as a stateless component as Jeff 
>> suggests would help make this easier to realize. 
>> 
>> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch  wrote:
>>> Removing CQL is, in my opinion, completely off the table. When we 
>>> deprecated Thrift and gave CQL as the new query language, we imposed 
>>> significant pain on our existing functional Thrift applications to migrate 
>>> to it - I feel we should not hurt our users like that again.
>>> 
>>> I worry that we already struggle to implement the current surface area of 
>>> CQL correctly and in a way that scales safely. For example, CQL allows us 
>>> to create arbitrarily large partitions, but large partitions and large 
>>> columns continue to be something our storage engine can't currently handle 
>>> well. CQL allows us to create secondary indices for improved filter support 
>>> but few can (or at least we struggle) to safely use them in production. We 
>>> still struggle with how page timeouts, hedges and retries work in an 
>>> idempotent and reliable way in our current protocol - although CQL at least 
>>> gives us a path to implementing those.
>>> 
>>> I wonder if we should focus on being excellent at the basic write and read 
>>> operations we already support before adding more complexity at the API 
>>> layer. I am excited by the recent proposals around unbounded partitions, 
>>> byte ordered partitioner with safe data movement, ability to execute 
>>> analytics queries efficiently via a separate columnar representation etc

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Chris Lohfink
Just throwing my 2 cents in. I'm probably in the unpopular camp of wanting
to to move the other direction towards a grpc endpoint that is even more
restrictive than cql. This is coming from a standpoint of needing to clean
up after mistakes (application/modeling etc, not cassandra) than the
standpoint of trying to sell people on using the database. I would
prefer to see all the features and endpoints we provide work well without
breaking than make cool demos and feature bullet points. That said I know
in order for a database to be successful we need the cool feature sets as
well.  CQL works for now and deprecating that would be an absolute
nightmare for people *already* using it (ie thrift migration was not fun
for anyone). I say create a new entrypoint or layer, mark it experimental
and allow operators to disable it but leave the existing CQL interface
alone.

Chris

On Tue, Nov 4, 2025 at 10:53 AM Isaac Reath  wrote:

> I share Joey's opinions on this. Many features that resemble SQL (e.g.,
> indexes, materialized views) come with caveats that stem from
> their implementation details rather than the query language itself. If we
> expose these same features through SQL as they are today, I think we'd risk
> setting users up for disappointment, since they will come in with implicit
> expectations about how a given SQL feature should work based on their
> previous experience and more often than not we won't meet that expectation.
> At least with CQL we set the expectation that this is a different database,
> where familiar concepts might behave differently than you would expect.
>
> That said, in terms of a long term direction, I think having SQL support
> is a good guiding light and implementing it as a stateless component as
> Jeff suggests would help make this easier to realize.
>
> On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch 
> wrote:
>
>> Removing CQL is, in my opinion, completely off the table. When we
>> deprecated Thrift and gave CQL as the new query language, we imposed
>> significant pain on our existing functional Thrift applications to migrate
>> to it - I feel we should not hurt our users like that again.
>>
>> I worry that we already struggle to implement the current surface area of
>> CQL correctly and in a way that scales safely. For example, CQL allows us
>> to create arbitrarily large partitions, but large partitions and large
>> columns continue to be something our storage engine can't currently handle
>> well. CQL allows us to create secondary indices for improved filter support
>> but few can (or at least we struggle) to safely use them in production. We
>> still struggle with how page timeouts, hedges and retries work in an
>> idempotent and reliable way in our current protocol - although CQL at least
>> gives us a path to implementing those.
>>
>> I wonder if we should focus on being excellent at the basic write and
>> read operations we already support before adding more complexity at the API
>> layer. I am excited by the recent proposals around unbounded partitions,
>> byte ordered partitioner with safe data movement, ability to execute
>> analytics queries efficiently via a separate columnar representation etc
>> ... and *all* of those and more would likely be *required* to tackle SQL
>> in any meaningful way.
>>
>> The surface area of SQL is much much wider, requiring functional
>> implementation of all of that plus joins, interactive transactions and
>> more. The SQL protocol itself is also quite poor for reliable communication
>> and rarely has performant async clients with size based pagination, per
>> page timeouts, per page hedging, incremental progress over a streaming
>> async interface, pagination resumption, etc ...  A lot of this difficulty
>> stems from the protocol often being tied to TCP connections and the
>> inherently unbounded complexity of the read interface.
>>
>> I guess I'm saying, I think we should prioritize succeeding at the API
>> scope we already have before adding more. Deferring to standard SQL syntax
>> or naming when we can just seems like a good idea (why reinvent concepts),
>> but I don't think the friction with CQL is because it's not SQL, I think
>> it's because users can't tell what works and what doesn't work.
>>
>> -Joey
>>
>> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie 
>> wrote:
>>
>>> +1 to Mick and Aleksey. I think the key for me was this:
>>>
>>> One is Cassandra’s wide-partition model with flexible clustering
>>> columns, which supports very large, ordered partitions (e.g. time-series
>>> and efficient range scans), rather than a strictly normalised, join-centric
>>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
>>> query-driven, table-per-query modelling helps move users toward designs
>>> that scale predictably.
>>>
>>>
>>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
>>> ) for users
>>> to be able to make sense of how their S

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Isaac Reath
I share Joey's opinions on this. Many features that resemble SQL (e.g.,
indexes, materialized views) come with caveats that stem from
their implementation details rather than the query language itself. If we
expose these same features through SQL as they are today, I think we'd risk
setting users up for disappointment, since they will come in with implicit
expectations about how a given SQL feature should work based on their
previous experience and more often than not we won't meet that expectation.
At least with CQL we set the expectation that this is a different database,
where familiar concepts might behave differently than you would expect.

That said, in terms of a long term direction, I think having SQL support is
a good guiding light and implementing it as a stateless component as Jeff
suggests would help make this easier to realize.

On Tue, Nov 4, 2025 at 10:23 AM Joseph Lynch  wrote:

> Removing CQL is, in my opinion, completely off the table. When we
> deprecated Thrift and gave CQL as the new query language, we imposed
> significant pain on our existing functional Thrift applications to migrate
> to it - I feel we should not hurt our users like that again.
>
> I worry that we already struggle to implement the current surface area of
> CQL correctly and in a way that scales safely. For example, CQL allows us
> to create arbitrarily large partitions, but large partitions and large
> columns continue to be something our storage engine can't currently handle
> well. CQL allows us to create secondary indices for improved filter support
> but few can (or at least we struggle) to safely use them in production. We
> still struggle with how page timeouts, hedges and retries work in an
> idempotent and reliable way in our current protocol - although CQL at least
> gives us a path to implementing those.
>
> I wonder if we should focus on being excellent at the basic write and read
> operations we already support before adding more complexity at the API
> layer. I am excited by the recent proposals around unbounded partitions,
> byte ordered partitioner with safe data movement, ability to execute
> analytics queries efficiently via a separate columnar representation etc
> ... and *all* of those and more would likely be *required* to tackle SQL
> in any meaningful way.
>
> The surface area of SQL is much much wider, requiring functional
> implementation of all of that plus joins, interactive transactions and
> more. The SQL protocol itself is also quite poor for reliable communication
> and rarely has performant async clients with size based pagination, per
> page timeouts, per page hedging, incremental progress over a streaming
> async interface, pagination resumption, etc ...  A lot of this difficulty
> stems from the protocol often being tied to TCP connections and the
> inherently unbounded complexity of the read interface.
>
> I guess I'm saying, I think we should prioritize succeeding at the API
> scope we already have before adding more. Deferring to standard SQL syntax
> or naming when we can just seems like a good idea (why reinvent concepts),
> but I don't think the friction with CQL is because it's not SQL, I think
> it's because users can't tell what works and what doesn't work.
>
> -Joey
>
> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie  wrote:
>
>> +1 to Mick and Aleksey. I think the key for me was this:
>>
>> One is Cassandra’s wide-partition model with flexible clustering columns,
>> which supports very large, ordered partitions (e.g. time-series and
>> efficient range scans), rather than a strictly normalised, join-centric
>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
>> query-driven, table-per-query modelling helps move users toward designs
>> that scale predictably.
>>
>>
>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
>> ) for users to
>> be able to make sense of how their SQL queries translate into underlying
>> disk access patterns. Having a wide-open field of full SQL compliance they
>> then need to understand how to constrain to get horizontal scale out of it
>> would be *much more challenging* than the already somewhat "new"
>> cognitive muscle our users have to build to realize that horizontal scaling
>> of data access doesn't come free.
>>
>> I think that would give us a future state of "Use SQL when you need /
>> want a lot of expressivity, use CQL when you need to be constrained to
>> language primitives that keep your data access scalable". The part that
>> gets me wary here is how we've run into pain in the past trying to be both
>> a database that allows more query expressivity (ALLOW FILTERING, legacy 2i
>> come to mind) and a database that also wants horizontal scale.
>>
>> I'd love us to be able to have our cake and eat it too but I don't know
>> if that's possible. So at the very least I'd advocate for SQL + CQL going
>> forward, or SQL + a constrained "CQL-like

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Benjamin Lerer
I would be curious to see a gap analysis between CQL and SQL that include
the differences in behaviors. I suspect that it will bring a few surprises
and provide some more solid foundation to this discussion.

Le mar. 4 nov. 2025 à 17:24, Štefan Miklošovič  a
écrit :

> I just want to ask this question ... feel free to shoot it down, just
> curious about the feedback / pros / cons.
>
> When we talk about "joins", yeah, it is not supported as we are used
> to in the SQL world. But joins _are_ possible, via Spark (Cassandra
> connector) / via Spark itself.
>
> When we have Cassandra Analytics now, why could not we integrate it
> with Cassandra (as something pluggable)? Basically, a user would
> execute
>
> USE shop;
>
> SELECT customers.name, orders.item FROM customers JOIN orders ON
> customers.id = orders.customer_id;
>
> Then we take this "CQL" query, construct logic for Spark behind that,
> put that to Analytics / Spark or whatever under the hood and present
> the result back to a caller?
>
> For now, we need to develop a custom Spark application, then to deploy
> it, then interpret the results and so on. I just do not see why we
> could not optionally integrate Spark into Cassandra in such a way,
> really something pluggable, which would enable this kind of queries. I
> just do not want to write any custom Spark app just to join two
> tables. Just delegate this kind of a query to Spark, wait for the
> result, and display it to me?
>
> On Tue, Nov 4, 2025 at 5:09 PM Aaron  wrote:
> >
> > Overall I like this idea. It will help us lower the learning curve for
> Cassandra, making it feel like a more viable option for folks who might not
> otherwise have considered it. Keeping CQL and SQL as parallel options is
> the approach that I would prefer, as well.
> >
> > Might not be a bad idea to classify SQL commands as OLTP vs. OLAP, and
> have v1 be just OLTP, with commands that are more often used in an OLAP
> paradigm to follow in v2. Doesn't have to be that, but it might be worth
> our time to see if there are logical ways that we can break-up the workload
> of a SQL implementation into more manageable pieces.
> >
> >>  I don't think the friction with CQL is because it's not SQL, I think
> it's because users can't tell what works and what doesn't work.
> >
> >
> > I don't think this is the main motivation here. The motivation for doing
> this is (should be) meeting a standard embraced by most other databases
> because it will ultimately help our users. We should want a developer (who
> has never touched Cassandra before) to be able to sit down and be
> productive with their existing skillset.
> >
> > We should also want to take some of the pain out of moving an existing
> application. It may not end up being as simple as re-pointing an
> application from Postgres to Cassandra, but reducing the friction involved
> should be a consideration.
> >
> > Thanks,
> >
> > Aaron
> >
> >
> > On Tue, Nov 4, 2025 at 9:23 AM Joseph Lynch 
> wrote:
> >>
> >> Removing CQL is, in my opinion, completely off the table. When we
> deprecated Thrift and gave CQL as the new query language, we imposed
> significant pain on our existing functional Thrift applications to migrate
> to it - I feel we should not hurt our users like that again.
> >>
> >> I worry that we already struggle to implement the current surface area
> of CQL correctly and in a way that scales safely. For example, CQL allows
> us to create arbitrarily large partitions, but large partitions and large
> columns continue to be something our storage engine can't currently handle
> well. CQL allows us to create secondary indices for improved filter support
> but few can (or at least we struggle) to safely use them in production. We
> still struggle with how page timeouts, hedges and retries work in an
> idempotent and reliable way in our current protocol - although CQL at least
> gives us a path to implementing those.
> >>
> >> I wonder if we should focus on being excellent at the basic write and
> read operations we already support before adding more complexity at the API
> layer. I am excited by the recent proposals around unbounded partitions,
> byte ordered partitioner with safe data movement, ability to execute
> analytics queries efficiently via a separate columnar representation etc
> ... and all of those and more would likely be required to tackle SQL in any
> meaningful way.
> >>
> >> The surface area of SQL is much much wider, requiring functional
> implementation of all of that plus joins, interactive transactions and
> more. The SQL protocol itself is also quite poor for reliable communication
> and rarely has performant async clients with size based pagination, per
> page timeouts, per page hedging, incremental progress over a streaming
> async interface, pagination resumption, etc ...  A lot of this difficulty
> stems from the protocol often being tied to TCP connections and the
> inherently unbounded complexity of the read interface.
> >>
> >> I guess I'm 

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Štefan Miklošovič
I just want to ask this question ... feel free to shoot it down, just
curious about the feedback / pros / cons.

When we talk about "joins", yeah, it is not supported as we are used
to in the SQL world. But joins _are_ possible, via Spark (Cassandra
connector) / via Spark itself.

When we have Cassandra Analytics now, why could not we integrate it
with Cassandra (as something pluggable)? Basically, a user would
execute

USE shop;

SELECT customers.name, orders.item FROM customers JOIN orders ON
customers.id = orders.customer_id;

Then we take this "CQL" query, construct logic for Spark behind that,
put that to Analytics / Spark or whatever under the hood and present
the result back to a caller?

For now, we need to develop a custom Spark application, then to deploy
it, then interpret the results and so on. I just do not see why we
could not optionally integrate Spark into Cassandra in such a way,
really something pluggable, which would enable this kind of queries. I
just do not want to write any custom Spark app just to join two
tables. Just delegate this kind of a query to Spark, wait for the
result, and display it to me?

On Tue, Nov 4, 2025 at 5:09 PM Aaron  wrote:
>
> Overall I like this idea. It will help us lower the learning curve for 
> Cassandra, making it feel like a more viable option for folks who might not 
> otherwise have considered it. Keeping CQL and SQL as parallel options is the 
> approach that I would prefer, as well.
>
> Might not be a bad idea to classify SQL commands as OLTP vs. OLAP, and have 
> v1 be just OLTP, with commands that are more often used in an OLAP paradigm 
> to follow in v2. Doesn't have to be that, but it might be worth our time to 
> see if there are logical ways that we can break-up the workload of a SQL 
> implementation into more manageable pieces.
>
>>  I don't think the friction with CQL is because it's not SQL, I think it's 
>> because users can't tell what works and what doesn't work.
>
>
> I don't think this is the main motivation here. The motivation for doing this 
> is (should be) meeting a standard embraced by most other databases because it 
> will ultimately help our users. We should want a developer (who has never 
> touched Cassandra before) to be able to sit down and be productive with their 
> existing skillset.
>
> We should also want to take some of the pain out of moving an existing 
> application. It may not end up being as simple as re-pointing an application 
> from Postgres to Cassandra, but reducing the friction involved should be a 
> consideration.
>
> Thanks,
>
> Aaron
>
>
> On Tue, Nov 4, 2025 at 9:23 AM Joseph Lynch  wrote:
>>
>> Removing CQL is, in my opinion, completely off the table. When we deprecated 
>> Thrift and gave CQL as the new query language, we imposed significant pain 
>> on our existing functional Thrift applications to migrate to it - I feel we 
>> should not hurt our users like that again.
>>
>> I worry that we already struggle to implement the current surface area of 
>> CQL correctly and in a way that scales safely. For example, CQL allows us to 
>> create arbitrarily large partitions, but large partitions and large columns 
>> continue to be something our storage engine can't currently handle well. CQL 
>> allows us to create secondary indices for improved filter support but few 
>> can (or at least we struggle) to safely use them in production. We still 
>> struggle with how page timeouts, hedges and retries work in an idempotent 
>> and reliable way in our current protocol - although CQL at least gives us a 
>> path to implementing those.
>>
>> I wonder if we should focus on being excellent at the basic write and read 
>> operations we already support before adding more complexity at the API 
>> layer. I am excited by the recent proposals around unbounded partitions, 
>> byte ordered partitioner with safe data movement, ability to execute 
>> analytics queries efficiently via a separate columnar representation etc ... 
>> and all of those and more would likely be required to tackle SQL in any 
>> meaningful way.
>>
>> The surface area of SQL is much much wider, requiring functional 
>> implementation of all of that plus joins, interactive transactions and more. 
>> The SQL protocol itself is also quite poor for reliable communication and 
>> rarely has performant async clients with size based pagination, per page 
>> timeouts, per page hedging, incremental progress over a streaming async 
>> interface, pagination resumption, etc ...  A lot of this difficulty stems 
>> from the protocol often being tied to TCP connections and the inherently 
>> unbounded complexity of the read interface.
>>
>> I guess I'm saying, I think we should prioritize succeeding at the API scope 
>> we already have before adding more. Deferring to standard SQL syntax or 
>> naming when we can just seems like a good idea (why reinvent concepts), but 
>> I don't think the friction with CQL is because it's not SQL, I think it's 
>> because us

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Štefan Miklošovič
Great summary where we are at with types, Patrick.

It is nice to see it like that.

I think the situation on the type front might be way better.
Supporting e.g macaddr/8 seems like a programming exercise.

Also since we have constraints as well, some constructs like enums
might be done quite easily too. e.g. like

myname text check in('black', 'white')

I think it is possible to do IN in PostgreSQL like this

colors TEXT CHECK (colors IN ('red', 'green', 'blue'))

I think that is pretty close to our constraints if IN is supported
there too. The current constraints implementation we have already
supports adding arguments into constraint function so this can be
implemented right away without anything else.

we do not need to have enum types (1). Or maybe we could, really up to
us how we model it, but if there is already a constraint framework
making it possible I do not think that introducing enum types just for
the sake of PostgreSQL compatibility is really necessary.

(1) https://www.postgresql.org/docs/current/datatype-enum.html

For types like "real" in Postgres and float in Cassandra, it seems
like it is the same thing, so maybe we might create an alias of "real"
in Cassandra as well?

JSON / JSONB / XML - this is an interesting idea, there was already
some discussion about this when we were introducing JSON constraints.
I am not against the introduction of JSON / XML type, they would be
validated on insert. Another discussion thread to have :)

Overall, when it comes to types at least, I think that the situation
might be way better and might be more compatible for sure.

On Fri, Oct 31, 2025 at 9:25 PM Patrick McFadin  wrote:
>
> Over the last decade, CQL has served Cassandra users well by offering a 
> familiar SQL-like interface for a distributed data model. However, as the 
> broader database ecosystem converges on PostgreSQL-style SQL as the de facto 
> standard for developers, it’s time to consider how Cassandra evolves to meet 
> developers where they are without losing what makes it unique.
>
> The great thing about SQL standards is that there are plenty to choose from. 
> While the formal SQL:2023 specification (ISO/IEC 9075) exists, the industry 
> has coalesced around the PostgreSQL dialect. Products such as AWS Aurora, 
> AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others offering 
> “PostgreSQL-compatible” modes, have validated this direction. Developers are 
> voting with their implementations. PostgreSQL SQL represents the lowest 
> cognitive-load interface for application data, as repeatedly confirmed by 
> developer surveys like Stack Overflow 2025[1].
>
> What I’m proposing is that we begin to normalize the frontend to expand 
> access to our extraordinary backend. The key principle here is ADD, not 
> DELETE. CQL continues to work and be supported while we expand Cassandra’s 
> capabilities through SQL compatibility, providing a familiar syntax and 
> potentially supporting a larger ecosystem (JDBC, etc.).
>
> Phase 1 (Before Cassandra 6) - Stop Digging
> Freeze CQL at version 3 and align all new syntax or features (DML/DDL) to the 
> PostgreSQL SQL dialect wherever possible. This approach was already 
> demonstrated with CEP-52 and should become our norm.
>
> Phase 2 (Years) - Create Parallel Paths
> This is where we take our time and do things carefully, most likely over a 
> series of years.  Don’t touch the CQL path. Add an opt-in, feature flag path 
> for SQL-only that conforms to the PostgreSQL SQL dialect. Begin our journey 
> to feature compatibility here. At Community over Code this year, Alex Petrov 
> and I sat in Aaron Ploetz’s kitchen (thanks for dinner, Aaron!) and 
> brainstormed how this could work. The two critical aspects to manage are 
> types and functionality. We may never be able to support everything, but 
> given what this project has accomplished over the years, I wouldn’t bet on 
> it. Being clear about the differences early on can serve as a roadmap for 
> future contributors who want to be involved.
>
> In discussion with Joel Shepherd on this topic, he sagely suggested some 
> sub-steps inside this phase:
>
> 1 - Prioritize SQL that is compatible to get the incremental wins and early 
> feedback from the user community.
> 2 - Tackle the non-compatible and triage for the long-term changes that would 
> need to happen.
> I took the time to do some rough mapping of syntax, features, and types:
>
> Function and Feature Compatibility tables: 
> https://docs.google.com/document/d/1K2-GKVM4Z_u1Hb1GtdrRyC9AdDN3RLwJ7LX_i_PqkOE/edit?usp=sharing
>
> Typing differences: 
> https://docs.google.com/spreadsheets/d/11tWkyCQ8WAFGnd5Va6iyltkp1wbKdAubxH9o_ZyJEtk/edit?usp=sharing
>
> Phase 3 (Indefinite timeframe)– Become Default SQL
> Once the SQL path achieves sufficient coverage and confidence, we can make it 
> the default frontend, with CQL continuing to be supported indefinitely. The 
> intent is not replacement but evolution toward broader accessibility.
>

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Aaron
Overall I like this idea. It will help us lower the learning curve for
Cassandra, making it feel like a more viable option for folks who might not
otherwise have considered it. Keeping CQL and SQL as parallel options is
the approach that I would prefer, as well.

Might not be a bad idea to classify SQL commands as OLTP vs. OLAP, and have
v1 be just OLTP, with commands that are more often used in an OLAP paradigm
to follow in v2. Doesn't have to be *that*, but it might be worth our time
to see if there are logical ways that we can break-up the workload of a SQL
implementation into more manageable pieces.

 I don't think the friction with CQL is because it's not SQL, I think it's
> because users can't tell what works and what doesn't work.


I don't think this is the main motivation here. The motivation for doing
this is (should be) meeting a standard embraced by most other databases
because it will ultimately help our users. We should want a developer (who
has never touched Cassandra before) to be able to sit down and be
productive with their existing skillset.

We should also want to take some of the pain out of moving an existing
application. It may not end up being as simple as re-pointing an
application from Postgres to Cassandra, but reducing the friction involved
should be a consideration.

Thanks,

Aaron


On Tue, Nov 4, 2025 at 9:23 AM Joseph Lynch  wrote:

> Removing CQL is, in my opinion, completely off the table. When we
> deprecated Thrift and gave CQL as the new query language, we imposed
> significant pain on our existing functional Thrift applications to migrate
> to it - I feel we should not hurt our users like that again.
>
> I worry that we already struggle to implement the current surface area of
> CQL correctly and in a way that scales safely. For example, CQL allows us
> to create arbitrarily large partitions, but large partitions and large
> columns continue to be something our storage engine can't currently handle
> well. CQL allows us to create secondary indices for improved filter support
> but few can (or at least we struggle) to safely use them in production. We
> still struggle with how page timeouts, hedges and retries work in an
> idempotent and reliable way in our current protocol - although CQL at least
> gives us a path to implementing those.
>
> I wonder if we should focus on being excellent at the basic write and read
> operations we already support before adding more complexity at the API
> layer. I am excited by the recent proposals around unbounded partitions,
> byte ordered partitioner with safe data movement, ability to execute
> analytics queries efficiently via a separate columnar representation etc
> ... and *all* of those and more would likely be *required* to tackle SQL
> in any meaningful way.
>
> The surface area of SQL is much much wider, requiring functional
> implementation of all of that plus joins, interactive transactions and
> more. The SQL protocol itself is also quite poor for reliable communication
> and rarely has performant async clients with size based pagination, per
> page timeouts, per page hedging, incremental progress over a streaming
> async interface, pagination resumption, etc ...  A lot of this difficulty
> stems from the protocol often being tied to TCP connections and the
> inherently unbounded complexity of the read interface.
>
> I guess I'm saying, I think we should prioritize succeeding at the API
> scope we already have before adding more. Deferring to standard SQL syntax
> or naming when we can just seems like a good idea (why reinvent concepts),
> but I don't think the friction with CQL is because it's not SQL, I think
> it's because users can't tell what works and what doesn't work.
>
> -Joey
>
> On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie  wrote:
>
>> +1 to Mick and Aleksey. I think the key for me was this:
>>
>> One is Cassandra’s wide-partition model with flexible clustering columns,
>> which supports very large, ordered partitions (e.g. time-series and
>> efficient range scans), rather than a strictly normalised, join-centric
>> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
>> query-driven, table-per-query modelling helps move users toward designs
>> that scale predictably.
>>
>>
>> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
>> ) for users to
>> be able to make sense of how their SQL queries translate into underlying
>> disk access patterns. Having a wide-open field of full SQL compliance they
>> then need to understand how to constrain to get horizontal scale out of it
>> would be *much more challenging* than the already somewhat "new"
>> cognitive muscle our users have to build to realize that horizontal scaling
>> of data access doesn't come free.
>>
>> I think that would give us a future state of "Use SQL when you need /
>> want a lot of expressivity, use CQL when you need to be constrained to
>> language primi

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Joseph Lynch
Removing CQL is, in my opinion, completely off the table. When we
deprecated Thrift and gave CQL as the new query language, we imposed
significant pain on our existing functional Thrift applications to migrate
to it - I feel we should not hurt our users like that again.

I worry that we already struggle to implement the current surface area of
CQL correctly and in a way that scales safely. For example, CQL allows us
to create arbitrarily large partitions, but large partitions and large
columns continue to be something our storage engine can't currently handle
well. CQL allows us to create secondary indices for improved filter support
but few can (or at least we struggle) to safely use them in production. We
still struggle with how page timeouts, hedges and retries work in an
idempotent and reliable way in our current protocol - although CQL at least
gives us a path to implementing those.

I wonder if we should focus on being excellent at the basic write and read
operations we already support before adding more complexity at the API
layer. I am excited by the recent proposals around unbounded partitions,
byte ordered partitioner with safe data movement, ability to execute
analytics queries efficiently via a separate columnar representation etc
... and *all* of those and more would likely be *required* to tackle SQL in
any meaningful way.

The surface area of SQL is much much wider, requiring functional
implementation of all of that plus joins, interactive transactions and
more. The SQL protocol itself is also quite poor for reliable communication
and rarely has performant async clients with size based pagination, per
page timeouts, per page hedging, incremental progress over a streaming
async interface, pagination resumption, etc ...  A lot of this difficulty
stems from the protocol often being tied to TCP connections and the
inherently unbounded complexity of the read interface.

I guess I'm saying, I think we should prioritize succeeding at the API
scope we already have before adding more. Deferring to standard SQL syntax
or naming when we can just seems like a good idea (why reinvent concepts),
but I don't think the friction with CQL is because it's not SQL, I think
it's because users can't tell what works and what doesn't work.

-Joey

On Tue, Nov 4, 2025 at 8:42 AM Josh McKenzie  wrote:

> +1 to Mick and Aleksey. I think the key for me was this:
>
> One is Cassandra’s wide-partition model with flexible clustering columns,
> which supports very large, ordered partitions (e.g. time-series and
> efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
>
>
> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
> ) for users to
> be able to make sense of how their SQL queries translate into underlying
> disk access patterns. Having a wide-open field of full SQL compliance they
> then need to understand how to constrain to get horizontal scale out of it
> would be *much more challenging* than the already somewhat "new"
> cognitive muscle our users have to build to realize that horizontal scaling
> of data access doesn't come free.
>
> I think that would give us a future state of "Use SQL when you need / want
> a lot of expressivity, use CQL when you need to be constrained to language
> primitives that keep your data access scalable". The part that gets me wary
> here is how we've run into pain in the past trying to be both a database
> that allows more query expressivity (ALLOW FILTERING, legacy 2i come to
> mind) and a database that also wants horizontal scale.
>
> I'd love us to be able to have our cake and eat it too but I don't know if
> that's possible. So at the very least I'd advocate for SQL + CQL going
> forward, or SQL + a constrained "CQL-like" mode that gives the same
> constraints CQL does today on modeling that guide people towards that very
> partitionable path.
>
> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>
> I don’t mind us implementing some Postgres syntax support in some
> capacity, but I do not like the idea of limiting what Cassandra is allowed
> to do, or expose via CQL, to what is expressible by Postgres’s SQL.
>
> Many moons ago, before we started work on native protocol and CQL, I could
> perhaps a bigger benefit to going Postgres route - for the client protocol
> and the language. We could piggyback on existing client infrastructure and
> SQL familiarity. But at this stage, when we have already made the effort to
> develop decent drivers, and CQL is fleshed out, and C* is quite mature
> overall, how much would we gain from this transition?
>
> I’m broadly with Mick here. And I support using Postgres’ SQL as
> inspiration for implementing new CQL features wherever it makes sense -
> it’s something we’ve be

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread mapyourown
I remember we had an in-depth discussion with Patrick after his talk in
Minneapolis, where I raised my concern about pursuing full PostgreSQL
compatibility, particularly considering the mindset shift it requires and
the fact that Cassandra doesn’t support joins.

While I understand the team’s direction toward adopting a PostgreSQL-style
syntax and compatibility, I believe it’s equally important to continue
maintaining strong CQL support. Many companies and developers are deeply
invested in CQL, and as other contributors mentioned, it takes time for
them to adapt to major changes.

>From past experience, when users feel key functionality is being taken
away, they often hesitate to upgrade to newer versions or even consider
forking the project to maintain their own version.

Just wanted to share my thoughts on this.


On Tue, Nov 4, 2025 at 8:57 AM Jeff Jirsa  wrote:

> I’m sorta confused. You can do single table design in sql if you don’t
> have a join centric workload. You still get to tell the database how to
> order your data on disk.
>
> BOP gives you efficient range scans  without having partition size
> problems that trap users when they cross into mega partition traps. I don’t
> think you have to say clustering data together is Cassandra’s key benefit,
> virtually  every database is doing that, we just happen to do it with
> chunks of the users set of data instead did all of it.
>
> Similarly suggesting the LSM / SStables somehow benefit write heavy cql
> but not sql is sorta weird since the explosion of rocksdb backed sql makes
> it clear you can use LSM + sstables for that too
>
>
> On Nov 4, 2025, at 5:43 AM, Josh McKenzie  wrote:
>
> 
>
> +1 to Mick and Aleksey. I think the key for me was this:
>
> One is Cassandra’s wide-partition model with flexible clustering columns,
> which supports very large, ordered partitions (e.g. time-series and
> efficient range scans), rather than a strictly normalised, join-centric
> model. These patterns don’t always map cleanly to SQL semantics, and CQL’s
> query-driven, table-per-query modelling helps move users toward designs
> that scale predictably.
>
>
> We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here
> ) for users to
> be able to make sense of how their SQL queries translate into underlying
> disk access patterns. Having a wide-open field of full SQL compliance they
> then need to understand how to constrain to get horizontal scale out of it
> would be *much more challenging* than the already somewhat "new"
> cognitive muscle our users have to build to realize that horizontal scaling
> of data access doesn't come free.
>
> I think that would give us a future state of "Use SQL when you need / want
> a lot of expressivity, use CQL when you need to be constrained to language
> primitives that keep your data access scalable". The part that gets me wary
> here is how we've run into pain in the past trying to be both a database
> that allows more query expressivity (ALLOW FILTERING, legacy 2i come to
> mind) and a database that also wants horizontal scale.
>
> I'd love us to be able to have our cake and eat it too but I don't know if
> that's possible. So at the very least I'd advocate for SQL + CQL going
> forward, or SQL + a constrained "CQL-like" mode that gives the same
> constraints CQL does today on modeling that guide people towards that very
> partitionable path.
>
> On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
>
> I don’t mind us implementing some Postgres syntax support in some
> capacity, but I do not like the idea of limiting what Cassandra is allowed
> to do, or expose via CQL, to what is expressible by Postgres’s SQL.
>
> Many moons ago, before we started work on native protocol and CQL, I could
> perhaps a bigger benefit to going Postgres route - for the client protocol
> and the language. We could piggyback on existing client infrastructure and
> SQL familiarity. But at this stage, when we have already made the effort to
> develop decent drivers, and CQL is fleshed out, and C* is quite mature
> overall, how much would we gain from this transition?
>
> I’m broadly with Mick here. And I support using Postgres’ SQL as
> inspiration for implementing new CQL features wherever it makes sense -
> it’s something we’ve been doing for a decade already. But I don’t believe
> that deprecating CQL is the way to go at this point.
>
> > On 4 Nov 2025, at 06:38, Mick  wrote:
> >
> >
> >
> >> On 3 Nov 2025, at 20:32, Joel Shepherd  wrote:
> >>
> >> At the same time, my personal opinion is that if SQL compatibility is
> pursued, then the end game should be to deprecate CQL. That will probably
> take years, but at the limit I don't see a lot of benefit to supporting
> both.
> >
> >
> >
> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is
> obvious, but it is a very broad question.
> >
> > The adoption and standardisation benefits are obvious, but CQL has

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Jeff Jirsa
I’m sorta confused. You can do single table design in sql if you don’t have a join centric workload. You still get to tell the database how to order your data on disk.BOP gives you efficient range scans  without having partition size problems that trap users when they cross into mega partition traps. I don’t think you have to say clustering data together is Cassandra’s key benefit, virtually  every database is doing that, we just happen to do it with chunks of the users set of data instead did all of it. Similarly suggesting the LSM / SStables somehow benefit write heavy cql but not sql is sorta weird since the explosion of rocksdb backed sql makes it clear you can use LSM + sstables for that tooOn Nov 4, 2025, at 5:43 AM, Josh McKenzie  wrote:+1 to Mick and Aleksey. I think the key for me was this:One is Cassandra’s wide-partition model with flexible clustering columns, which supports very large, ordered partitions (e.g. time-series and efficient range scans), rather than a strictly normalised, join-centric model. These patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, table-per-query modelling helps move users toward designs that scale predictably.We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here) for users to be able to make sense of how their SQL queries translate into underlying disk access patterns. Having a wide-open field of full SQL compliance they then need to understand how to constrain to get horizontal scale out of it would be much more challenging than the already somewhat "new" cognitive muscle our users have to build to realize that horizontal scaling of data access doesn't come free.I think that would give us a future state of "Use SQL when you need / want a lot of expressivity, use CQL when you need to be constrained to language primitives that keep your data access scalable". The part that gets me wary here is how we've run into pain in the past trying to be both a database that allows more query expressivity (ALLOW FILTERING, legacy 2i come to mind) and a database that also wants horizontal scale.I'd love us to be able to have our cake and eat it too but I don't know if that's possible. So at the very least I'd advocate for SQL + CQL going forward, or SQL + a constrained "CQL-like" mode that gives the same constraints CQL does today on modeling that guide people towards that very partitionable path.On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:I don’t mind us implementing some Postgres syntax support in some capacity, but I do not like the idea of limiting what Cassandra is allowed to do, or expose via CQL, to what is expressible by Postgres’s SQL.Many moons ago, before we started work on native protocol and CQL, I could perhaps a bigger benefit to going Postgres route - for the client protocol and the language. We could piggyback on existing client infrastructure and SQL familiarity. But at this stage, when we have already made the effort to develop decent drivers, and CQL is fleshed out, and C* is quite mature overall, how much would we gain from this transition?I’m broadly with Mick here. And I support using Postgres’ SQL as inspiration for implementing new CQL features wherever it makes sense - it’s something we’ve been doing for a decade already. But I don’t believe that deprecating CQL is the way to go at this point.> On 4 Nov 2025, at 06:38, Mick  wrote:> > > >> On 3 Nov 2025, at 20:32, Joel Shepherd  wrote:>> >> At the same time, my personal opinion is that if SQL compatibility is pursued, then the end game should be to deprecate CQL. That will probably take years, but at the limit I don't see a lot of benefit to supporting both.> > > > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is obvious, but it is a very broad question.> > The adoption and standardisation benefits are obvious, but CQL has strengths relative to SQL in Cassandra’s context.  > > One is Cassandra’s wide-partition model with flexible clustering columns, which supports very large, ordered partitions (e.g. time-series and efficient range scans), rather than a strictly normalised, join-centric model. These patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, table-per-query modelling helps move users toward designs that scale predictably.> > I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ framing, e.g. eventually default to a SQL interface (with Accord) for the broadest UX, while CQL remains a high-throughput path.> > Should we also be discussing storage-engine implications ?  Cassandra’s LSMT/SSTable design optimises write paths; while a SQL presents a logical view without constraining physical layout; so data on disk stays optimised for dominant access patterns.  I can also see the need to discuss transport vs query languages differences.> > Are we after both SQL's DML and DDL abilit

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Josh McKenzie
+1 to Mick and Aleksey. I think the key for me was this:
> One is Cassandra’s wide-partition model with flexible clustering columns, 
> which supports very large, ordered partitions (e.g. time-series and efficient 
> range scans), rather than a strictly normalised, join-centric model. These 
> patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, 
> table-per-query modelling helps move users toward designs that scale 
> predictably.

We'd need really robust EXPLAIN / EXPLAIN ANALYZE support (see here 
) for users to be 
able to make sense of how their SQL queries translate into underlying disk 
access patterns. Having a wide-open field of full SQL compliance they then need 
to understand how to constrain to get horizontal scale out of it would be *much 
more challenging* than the already somewhat "new" cognitive muscle our users 
have to build to realize that horizontal scaling of data access doesn't come 
free.

I think that would give us a future state of "Use SQL when you need / want a 
lot of expressivity, use CQL when you need to be constrained to language 
primitives that keep your data access scalable". The part that gets me wary 
here is how we've run into pain in the past trying to be both a database that 
allows more query expressivity (ALLOW FILTERING, legacy 2i come to mind) and a 
database that also wants horizontal scale.

I'd love us to be able to have our cake and eat it too but I don't know if 
that's possible. So at the very least I'd advocate for SQL + CQL going forward, 
or SQL + a constrained "CQL-like" mode that gives the same constraints CQL does 
today on modeling that guide people towards that very partitionable path.

On Tue, Nov 4, 2025, at 8:12 AM, Aleksey Yeshchenko wrote:
> I don’t mind us implementing some Postgres syntax support in some capacity, 
> but I do not like the idea of limiting what Cassandra is allowed to do, or 
> expose via CQL, to what is expressible by Postgres’s SQL.
> 
> Many moons ago, before we started work on native protocol and CQL, I could 
> perhaps a bigger benefit to going Postgres route - for the client protocol 
> and the language. We could piggyback on existing client infrastructure and 
> SQL familiarity. But at this stage, when we have already made the effort to 
> develop decent drivers, and CQL is fleshed out, and C* is quite mature 
> overall, how much would we gain from this transition?
> 
> I’m broadly with Mick here. And I support using Postgres’ SQL as inspiration 
> for implementing new CQL features wherever it makes sense - it’s something 
> we’ve been doing for a decade already. But I don’t believe that deprecating 
> CQL is the way to go at this point.
> 
> > On 4 Nov 2025, at 06:38, Mick  wrote:
> > 
> > 
> > 
> >> On 3 Nov 2025, at 20:32, Joel Shepherd  wrote:
> >> 
> >> At the same time, my personal opinion is that if SQL compatibility is 
> >> pursued, then the end game should be to deprecate CQL. That will probably 
> >> take years, but at the limit I don't see a lot of benefit to supporting 
> >> both.
> > 
> > 
> > 
> > We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is 
> > obvious, but it is a very broad question.
> > 
> > The adoption and standardisation benefits are obvious, but CQL has 
> > strengths relative to SQL in Cassandra’s context.  
> > 
> > One is Cassandra’s wide-partition model with flexible clustering columns, 
> > which supports very large, ordered partitions (e.g. time-series and 
> > efficient range scans), rather than a strictly normalised, join-centric 
> > model. These patterns don’t always map cleanly to SQL semantics, and CQL’s 
> > query-driven, table-per-query modelling helps move users toward designs 
> > that scale predictably.
> > 
> > I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, 
> > while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ framing, 
> > e.g. eventually default to a SQL interface (with Accord) for the broadest 
> > UX, while CQL remains a high-throughput path.
> > 
> > Should we also be discussing storage-engine implications ?  Cassandra’s 
> > LSMT/SSTable design optimises write paths; while a SQL presents a logical 
> > view without constraining physical layout; so data on disk stays optimised 
> > for dominant access patterns.  I can also see the need to discuss transport 
> > vs query languages differences.
> > 
> > Are we after both SQL's DML and DDL abilities ?  Beyond accessibility and 
> > exploration, SQL often comes with mature tooling for schema change 
> > management. Cassandra supports online schema changes (e.g., ALTER TABLE), 
> > but cross-table/primary-key changes remain constrained. A SQL interface 
> > alone won’t ‘solve’ this: it’s about migration tooling and engine 
> > capabilities; changing data models at-scale faces separate challenges.
> > 
> > Especially outside of early-stage apps and ad-hoc exploration I find SQL 
> 

Re: [DISCUSS] SQL support in Cassandra

2025-11-04 Thread Aleksey Yeshchenko
I don’t mind us implementing some Postgres syntax support in some capacity, but 
I do not like the idea of limiting what Cassandra is allowed to do, or expose 
via CQL, to what is expressible by Postgres’s SQL.

Many moons ago, before we started work on native protocol and CQL, I could 
perhaps a bigger benefit to going Postgres route - for the client protocol and 
the language. We could piggyback on existing client infrastructure and SQL 
familiarity. But at this stage, when we have already made the effort to develop 
decent drivers, and CQL is fleshed out, and C* is quite mature overall, how 
much would we gain from this transition?

I’m broadly with Mick here. And I support using Postgres’ SQL as inspiration 
for implementing new CQL features wherever it makes sense - it’s something 
we’ve been doing for a decade already. But I don’t believe that deprecating CQL 
is the way to go at this point.

> On 4 Nov 2025, at 06:38, Mick  wrote:
> 
> 
> 
>> On 3 Nov 2025, at 20:32, Joel Shepherd  wrote:
>> 
>> At the same time, my personal opinion is that if SQL compatibility is 
>> pursued, then the end game should be to deprecate CQL. That will probably 
>> take years, but at the limit I don't see a lot of benefit to supporting both.
> 
> 
> 
> We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is 
> obvious, but it is a very broad question.
> 
> The adoption and standardisation benefits are obvious, but CQL has strengths 
> relative to SQL in Cassandra’s context.  
> 
> One is Cassandra’s wide-partition model with flexible clustering columns, 
> which supports very large, ordered partitions (e.g. time-series and efficient 
> range scans), rather than a strictly normalised, join-centric model. These 
> patterns don’t always map cleanly to SQL semantics, and CQL’s query-driven, 
> table-per-query modelling helps move users toward designs that scale 
> predictably.
> 
> I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, 
> while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ framing, 
> e.g. eventually default to a SQL interface (with Accord) for the broadest UX, 
> while CQL remains a high-throughput path.
> 
> Should we also be discussing storage-engine implications ?  Cassandra’s 
> LSMT/SSTable design optimises write paths; while a SQL presents a logical 
> view without constraining physical layout; so data on disk stays optimised 
> for dominant access patterns.  I can also see the need to discuss transport 
> vs query languages differences.
> 
> Are we after both SQL's DML and DDL abilities ?  Beyond accessibility and 
> exploration, SQL often comes with mature tooling for schema change 
> management. Cassandra supports online schema changes (e.g., ALTER TABLE), but 
> cross-table/primary-key changes remain constrained. A SQL interface alone 
> won’t ‘solve’ this: it’s about migration tooling and engine capabilities; 
> changing data models at-scale faces separate challenges.
> 
> Especially outside of early-stage apps and ad-hoc exploration I find SQL less 
> interesting and its ergonomics less aligned with Cassandra’s runtime 
> performance model.  That doesn't make me opposed to the endeavour of SQL 
> compatibility, it pushes me on the why question a bit more for alignment 
> clarity to our strengths.



Re: [DISCUSS] SQL support in Cassandra

2025-11-03 Thread Mick



> On 3 Nov 2025, at 20:32, Joel Shepherd  wrote:
> 
> At the same time, my personal opinion is that if SQL compatibility is 
> pursued, then the end game should be to deprecate CQL. That will probably 
> take years, but at the limit I don't see a lot of benefit to supporting both.



We want SQL, but _why_ (in all its nuances) do we want SQL ?  A lot is obvious, 
but it is a very broad question.

The adoption and standardisation benefits are obvious, but CQL has strengths 
relative to SQL in Cassandra’s context.  

One is Cassandra’s wide-partition model with flexible clustering columns, which 
supports very large, ordered partitions (e.g. time-series and efficient range 
scans), rather than a strictly normalised, join-centric model. These patterns 
don’t always map cleanly to SQL semantics, and CQL’s query-driven, 
table-per-query modelling helps move users toward designs that scale 
predictably.

I can see CQL continuing as Cassandra’s high-throughput, query-driven DSL, 
while we pursue SQL compatibility.  I appreciate Dinesh’s ‘lanes’ framing, e.g. 
eventually default to a SQL interface (with Accord) for the broadest UX, while 
CQL remains a high-throughput path.

Should we also be discussing storage-engine implications ?  Cassandra’s 
LSMT/SSTable design optimises write paths; while a SQL presents a logical view 
without constraining physical layout; so data on disk stays optimised for 
dominant access patterns.  I can also see the need to discuss transport vs 
query languages differences.

Are we after both SQL's DML and DDL abilities ?  Beyond accessibility and 
exploration, SQL often comes with mature tooling for schema change management. 
Cassandra supports online schema changes (e.g., ALTER TABLE), but 
cross-table/primary-key changes remain constrained. A SQL interface alone won’t 
‘solve’ this: it’s about migration tooling and engine capabilities; changing 
data models at-scale faces separate challenges.

Especially outside of early-stage apps and ad-hoc exploration I find SQL less 
interesting and its ergonomics less aligned with Cassandra’s runtime 
performance model.  That doesn't make me opposed to the endeavour of SQL 
compatibility, it pushes me on the why question a bit more for alignment 
clarity to our strengths.

Re: [DISCUSS] SQL support in Cassandra

2025-11-03 Thread Jaydeep Chovatia
+1 I support this initiative.
Also, I agree with the points raised by Joel.

   - We should deprecate CQL in the long term over SQL.
   - Cassandra engine is not optimized with all the SQL features, such as
   joins,  so we should disable those features.
   - I view this initiative as a way to slowly migrate CQL -> SQL rather
   than a standalone SQL, so I am looking forward to more detail.


Jaydeep

On Mon, Nov 3, 2025 at 11:32 AM Joel Shepherd  wrote:

> On 11/1/2025 9:32 AM, Dinesh Joshi wrote:
>
>
> On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin 
> wrote:
>
>>
>> Jeff and Dinesh jumped into Phase 2, which is really the fun and
>> interesting part. To be clear, I am not proposing we make any changes pre
>> Cassandra 6 in this case. And this will be a CEP or two or three.
>>
>
> I was not intentionally trying to jump to Phase 2. I was trying to sus out
> the shape of what you were saying. I would like to think in terms of the
> user requirements to fully understand your proposal. IMO, SQL is a dialect
> that Cassandra can adopt and requires the right building blocks at the
> storage layer to work well. CQL should continue living alongside SQL and
> honestly we should not try to convert between those two unless there is a
> clear, well articulated reason for doing it. To be clear, I am not saying
> there is one. I am only keeping the door open for a constructive discussion
> around it if you or anybody else has one.
>
> Conversion doesn't seem practical or desirable: it'd probably result in
> CQL and a muddled version of SQL which wouldn't be beneficial for anyone.
>
> At the same time, my personal opinion is that if SQL compatibility is
> pursued, then the end game should be to deprecate CQL. That will probably
> take years, but at the limit I don't see a lot of benefit to supporting
> both.
>
> Adopting SQL as kind of the "lingua franca" of declarative data access
> seems like a great way to increase adoption by giving new users an easier
> learning curve and maybe eventually making 3rd party integration (ORMs,
> drivers, etc.) easier. Let Cassandra's differentiating features shine.
>
> The risk I see with pursuing SQL compatibility is striking the balance
> between preserving (or even strengthening) Cassandra's differentiating
> features -- huge scale, fast writes, tunable consistency, rich feature set
> -- without adding a bunch of gotcha's to its SQL support. For example, if
> the user needs to understand Cassandra's partitions and primary keys for
> optimal performance, then supporting arbitrary joins via SQL might lead
> less experienced users down a bad path. Is that an acceptable risk (as it
> seems to be for most RDBMSs), or something we'd want to put guardrails
> around, or just prevent outright (i.e., support the syntax, but constrain
> the semantics to prevent cluster-wide table scans, etc.)?
>
> Does SQL support mean full syntax and semantic compatibility, or syntax
> compatibility with Cassandra-specific constraints on the semantics (so that
> Cassandra's key benefits aren't compromised)?
>
> IMO, the user should pick a lane - CQL or SQL - when they create their
> Keyspace. This simplifies the implementation and descopes any potential
> conversion or compatibility related engineering effort.
>
> +1.
>
> Thanks -- Joel.
>
>
>


Re: [DISCUSS] SQL support in Cassandra

2025-11-03 Thread Joel Shepherd

On 11/1/2025 9:32 AM, Dinesh Joshi wrote:


On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin  
wrote:



Jeff and Dinesh jumped into Phase 2, which is really the fun and
interesting part. To be clear, I am not proposing we make any
changes pre Cassandra 6 in this case. And this will be a CEP or
two or three.


I was not intentionally trying to jump to Phase 2. I was trying to sus 
out the shape of what you were saying. I would like to think in terms 
of the user requirements to fully understand your proposal. IMO, SQL 
is a dialect that Cassandra can adopt and requires the right building 
blocks at the storage layer to work well. CQL should continue living 
alongside SQL and honestly we should not try to convert between those 
two unless there is a clear, well articulated reason for doing it. To 
be clear, I am not saying there is one. I am only keeping the door 
open for a constructive discussion around it if you or anybody else 
has one.


Conversion doesn't seem practical or desirable: it'd probably result in 
CQL and a muddled version of SQL which wouldn't be beneficial for anyone.


At the same time, my personal opinion is that if SQL compatibility is 
pursued, then the end game should be to deprecate CQL. That will 
probably take years, but at the limit I don't see a lot of benefit to 
supporting both.


Adopting SQL as kind of the "lingua franca" of declarative data access 
seems like a great way to increase adoption by giving new users an 
easier learning curve and maybe eventually making 3rd party integration 
(ORMs, drivers, etc.) easier. Let Cassandra's differentiating features 
shine.


The risk I see with pursuing SQL compatibility is striking the balance 
between preserving (or even strengthening) Cassandra's differentiating 
features -- huge scale, fast writes, tunable consistency, rich feature 
set -- without adding a bunch of gotcha's to its SQL support. For 
example, if the user needs to understand Cassandra's partitions and 
primary keys for optimal performance, then supporting arbitrary joins 
via SQL might lead less experienced users down a bad path. Is that an 
acceptable risk (as it seems to be for most RDBMSs), or something we'd 
want to put guardrails around, or just prevent outright (i.e., support 
the syntax, but constrain the semantics to prevent cluster-wide table 
scans, etc.)?


Does SQL support mean full syntax and semantic compatibility, or syntax 
compatibility with Cassandra-specific constraints on the semantics (so 
that Cassandra's key benefits aren't compromised)?


IMO, the user should pick a lane - CQL or SQL - when they create their 
Keyspace. This simplifies the implementation and descopes any 
potential conversion or compatibility related engineering effort.


+1.

Thanks -- Joel.



Re: [DISCUSS] SQL support in Cassandra

2025-11-01 Thread Dinesh Joshi
On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin  wrote:

>
> Jeff and Dinesh jumped into Phase 2, which is really the fun and
> interesting part. To be clear, I am not proposing we make any changes pre
> Cassandra 6 in this case. And this will be a CEP or two or three.
>

I was not intentionally trying to jump to Phase 2. I was trying to sus out
the shape of what you were saying. I would like to think in terms of the
user requirements to fully understand your proposal. IMO, SQL is a dialect
that Cassandra can adopt and requires the right building blocks at the
storage layer to work well. CQL should continue living alongside SQL and
honestly we should not try to convert between those two unless there is a
clear, well articulated reason for doing it. To be clear, I am not saying
there is one. I am only keeping the door open for a constructive discussion
around it if you or anybody else has one.

IMO, the user should pick a lane - CQL or SQL - when they create their
Keyspace. This simplifies the implementation and descopes any potential
conversion or compatibility related engineering effort.

Thanks,

Dinesh


Re: [DISCUSS] SQL support in Cassandra

2025-11-01 Thread Jeff Jirsa
You canBut you can also build that same layer stateless right now and not worry about trying to contain the cql’isms which expose the clustering concepts On Nov 1, 2025, at 8:04 AM, Patrick McFadin  wrote:This opens up an entire line of discussion about the bigger goal of Cassandra becoming a fully cloud native DB but I’m here for it. I’m not going to disagree with Jeff’s point. There is prior art that shows this is a way forward. What the DataStax team did splitting up parts of Cassandra to deploy independently has made multi-tenancy a proven a direction that works in production. Specifically what Jeff is proposing with a stateless service above a KV storage backend is exactly what TiDB does but with MySQL support. Going back to my talk at CoC, this is what I firmly believe. Our moat is the storage engine and how it we continue to evolve it for more use cases but stick to the fundamentals of durability, distribution and scale. Accord, TCM and what’s now being proposed in CEP-57 are all critical elements to supporting more diverse workloads and holding the line on our core values. With fundamental re-architecture like this, then why couldn’t we just use existing projects like Apache Calcite(https://calcite.apache.org/) and Substrait(https://substrait.io/)? PatrickOn Oct 31, 2025, at 7:16 PM, C. Scott Andreas  wrote:Jeff’s thoughts are mine exactly, and how I would imagine building this.– Scott—MobileOn Oct 31, 2025, at 9:53 PM, Jon Haddad  wrote:I agree that new features should leverage SQL syntax.  I can't think of a reason not to. On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin  wrote:I knew this would be a lot of information to try to convey. I swear it sounded amazing in my head :DLet’s break up the Phase 1 and everything after it conversations. The Phase 1 part was in response to some recent discussions on current CEPs. CEP-55(https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) and CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf) In those threads, the syntax for the changes would have been unique to CQL. My suggestion was to just use the syntax for similar features in pgSQL. Jyothsna is finishing up CEP-52 and the syntax is pgSQL and so anyone using that new-to-Cassandra feature will find familiar syntax.In those threads, it was correctly pointed out that we don’t have any agreed upon guidelines. My proposal with Phase 1 is just that. Any new feature that is proposed, we defer to the pgsql format whenever possible. And no, I'm not proposing we back port anything! (Not going there) I don’t think that is a CEP, however it does need some formality. Maybe a VOTE thread and it’s just policy? Jeff and Dinesh jumped into Phase 2, which is really the fun and interesting part. To be clear, I am not proposing we make any changes pre Cassandra 6 in this case. And this will be a CEP or two or three. To directly answer the questions and my first shot at imaging an implementation in Phase 2, I think this is a matter of making QueryProcessor[1] pluggable. I can’t take credit for this idea. It’s been floated a few times, but in this case it might be the best place to start. And to answer Jeff, this isn’t transforming CQL. I do think this is a new implementation. Then possibly you could run both CQL and SQL at the same time. It’s just a matter of what gets sent down to the storage engine. And then there is CEP-39 [2]Jeff’s point about BOP is really interesting. And let’s not forget about CEP-57 that was just proposed. The point being we have a lot of future changes that, if everything is aligned, could come together is some interesting ways. We have to agree on the directionality as a baseline. It’s a strategy, not a plan. 1 - https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java2 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer3 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfacesOn Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova  wrote:Hey Patrick,Thanks for starting this discussion. I am also curious to read the response to Dinesh’s questions.Plus I have one to add myself (potentially more after I spend more time on this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0 alpha to review all new not yet released syntax to try to align it with SQL? You plan to open a CEP and work on that? Or I misunderstood what you suggest?Best regards,EkaterinaOn Fri, 31 Oct 2025 at 18:15, Dinesh Joshi  wrote:Thank you Patrick for starting this thread. Your talk was interesting. I want to better understand the nature of compatibility aspect of what you're proposing. Specifically, how do you envision the following scenarios to be supported in this new world –1. Could an operator enable CQL and SQL simultaneously?2. Does the user need to pick CQL or SQL at the time of Keyspace creation or c

Re: [DISCUSS] SQL support in Cassandra

2025-11-01 Thread Patrick McFadin
This opens up an entire line of discussion about the bigger goal of Cassandra 
becoming a fully cloud native DB but I’m here for it. 

I’m not going to disagree with Jeff’s point. There is prior art that shows this 
is a way forward. What the DataStax team did splitting up parts of Cassandra to 
deploy independently has made multi-tenancy a proven a direction that works in 
production. Specifically what Jeff is proposing with a stateless service above 
a KV storage backend is exactly what TiDB does but with MySQL support. 

Going back to my talk at CoC, this is what I firmly believe. Our moat is the 
storage engine and how it we continue to evolve it for more use cases but stick 
to the fundamentals of durability, distribution and scale. Accord, TCM and 
what’s now being proposed in CEP-57 are all critical elements to supporting 
more diverse workloads and holding the line on our core values. 

With fundamental re-architecture like this, then why couldn’t we just use 
existing projects like Apache Calcite(https://calcite.apache.org/) and 
Substrait(https://substrait.io/)? 

Patrick

> On Oct 31, 2025, at 7:16 PM, C. Scott Andreas  wrote:
> 
> Jeff’s thoughts are mine exactly, and how I would imagine building this.
> 
> – Scott
> 
> —
> Mobile
> 
>> On Oct 31, 2025, at 9:53 PM, Jon Haddad  wrote:
>> 
>> 
>> I agree that new features should leverage SQL syntax.  I can't think of a 
>> reason not to. 
>> 
>> On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin > > wrote:
>>> I knew this would be a lot of information to try to convey. I swear it 
>>> sounded amazing in my head :D
>>> 
>>> Let’s break up the Phase 1 and everything after it conversations. 
>>> 
>>> The Phase 1 part was in response to some recent discussions on current 
>>> CEPs. 
>>> CEP-55(https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) 
>>> and 
>>> CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf) In 
>>> those threads, the syntax for the changes would have been unique to CQL. My 
>>> suggestion was to just use the syntax for similar features in pgSQL. 
>>> Jyothsna is finishing up CEP-52 and the syntax is pgSQL and so anyone using 
>>> that new-to-Cassandra feature will find familiar syntax.
>>> 
>>> In those threads, it was correctly pointed out that we don’t have any 
>>> agreed upon guidelines. My proposal with Phase 1 is just that. Any new 
>>> feature that is proposed, we defer to the pgsql format whenever possible. 
>>> And no, I'm not proposing we back port anything! (Not going there) 
>>> 
>>> I don’t think that is a CEP, however it does need some formality. Maybe a 
>>> VOTE thread and it’s just policy? 
>>> 
>>> 
>>> Jeff and Dinesh jumped into Phase 2, which is really the fun and 
>>> interesting part. To be clear, I am not proposing we make any changes pre 
>>> Cassandra 6 in this case. And this will be a CEP or two or three. 
>>> 
>>> To directly answer the questions and my first shot at imaging an 
>>> implementation in Phase 2, I think this is a matter of making 
>>> QueryProcessor[1] pluggable. I can’t take credit for this idea. It’s been 
>>> floated a few times, but in this case it might be the best place to start. 
>>> And to answer Jeff, this isn’t transforming CQL. I do think this is a new 
>>> implementation. 
>>> 
>>> Then possibly you could run both CQL and SQL at the same time. It’s just a 
>>> matter of what gets sent down to the storage engine. And then there is 
>>> CEP-39 [2]
>>> 
>>> Jeff’s point about BOP is really interesting. And let’s not forget about 
>>> CEP-57 that was just proposed. The point being we have a lot of future 
>>> changes that, if everything is aligned, could come together is some 
>>> interesting ways. We have to agree on the directionality as a baseline. 
>>> It’s a strategy, not a plan. 
>>> 
>>> 1 - 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java
>>> 2 - 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>> 3 - 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfaces
>>> 
 On Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova >>> > wrote:
 
 Hey Patrick,
 
 Thanks for starting this discussion. 
 I am also curious to read the response to Dinesh’s questions.
 
 Plus I have one to add myself (potentially more after I spend more time on 
 this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0 
 alpha to review all new not yet released syntax to try to align it with 
 SQL? You plan to open a CEP and work on that? Or I misunderstood what you 
 suggest?
 
 Best regards,
 Ekaterina
 
 On Fri, 31 Oct 2025 at 18:15, Dinesh Joshi >>> > wrote:
> Thank you Patrick for starting this thread. Your talk was interesting. I 
> want to better understand the natu

Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread C. Scott Andreas
Jeff’s thoughts are mine exactly, and how I would imagine building this.– Scott—MobileOn Oct 31, 2025, at 9:53 PM, Jon Haddad  wrote:I agree that new features should leverage SQL syntax.  I can't think of a reason not to. On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin  wrote:I knew this would be a lot of information to try to convey. I swear it sounded amazing in my head :DLet’s break up the Phase 1 and everything after it conversations. The Phase 1 part was in response to some recent discussions on current CEPs. CEP-55(https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) and CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf) In those threads, the syntax for the changes would have been unique to CQL. My suggestion was to just use the syntax for similar features in pgSQL. Jyothsna is finishing up CEP-52 and the syntax is pgSQL and so anyone using that new-to-Cassandra feature will find familiar syntax.In those threads, it was correctly pointed out that we don’t have any agreed upon guidelines. My proposal with Phase 1 is just that. Any new feature that is proposed, we defer to the pgsql format whenever possible. And no, I'm not proposing we back port anything! (Not going there) I don’t think that is a CEP, however it does need some formality. Maybe a VOTE thread and it’s just policy? Jeff and Dinesh jumped into Phase 2, which is really the fun and interesting part. To be clear, I am not proposing we make any changes pre Cassandra 6 in this case. And this will be a CEP or two or three. To directly answer the questions and my first shot at imaging an implementation in Phase 2, I think this is a matter of making QueryProcessor[1] pluggable. I can’t take credit for this idea. It’s been floated a few times, but in this case it might be the best place to start. And to answer Jeff, this isn’t transforming CQL. I do think this is a new implementation. Then possibly you could run both CQL and SQL at the same time. It’s just a matter of what gets sent down to the storage engine. And then there is CEP-39 [2]Jeff’s point about BOP is really interesting. And let’s not forget about CEP-57 that was just proposed. The point being we have a lot of future changes that, if everything is aligned, could come together is some interesting ways. We have to agree on the directionality as a baseline. It’s a strategy, not a plan. 1 - https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java2 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer3 - https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfacesOn Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova  wrote:Hey Patrick,Thanks for starting this discussion. I am also curious to read the response to Dinesh’s questions.Plus I have one to add myself (potentially more after I spend more time on this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0 alpha to review all new not yet released syntax to try to align it with SQL? You plan to open a CEP and work on that? Or I misunderstood what you suggest?Best regards,EkaterinaOn Fri, 31 Oct 2025 at 18:15, Dinesh Joshi  wrote:Thank you Patrick for starting this thread. Your talk was interesting. I want to better understand the nature of compatibility aspect of what you're proposing. Specifically, how do you envision the following scenarios to be supported in this new world –1. Could an operator enable CQL and SQL simultaneously?2. Does the user need to pick CQL or SQL at the time of Keyspace creation or can they switch between CQL and SQL on the fly?3. Would the user be able to read and write to the same Keyspace using both CQL and SQL?4. Do you envision the user being able to write using CQL and read using SQL?Thanks,DineshOn Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin  wrote:Over the last decade, CQL has served Cassandra users well by offering a familiar SQL-like interface for a distributed data model. However, as the broader database ecosystem converges on PostgreSQL-style SQL as the de facto standard for developers, it’s time to consider how Cassandra evolves to meet developers where they are without losing what makes it unique.The great thing about SQL standards is that there are plenty to choose from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the industry has coalesced around the PostgreSQL dialect. Products such as AWS Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others offering “PostgreSQL-compatible” modes, have validated this direction. Developers are voting with their implementations. PostgreSQL SQL represents the lowest cognitive-load interface for application data, as repeatedly confirmed by developer surveys like Stack Overflow 2025[1]. What I’m proposing is that we begin to normalize the frontend to expand access to our extraordinary backend. 

Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread Jon Haddad
I agree that new features should leverage SQL syntax.  I can't think of a
reason not to.

On Fri, Oct 31, 2025 at 5:00 PM Patrick McFadin  wrote:

> I knew this would be a lot of information to try to convey. I swear it
> sounded amazing in my head :D
>
> Let’s break up the Phase 1 and everything after it conversations.
>
> The Phase 1 part was in response to some recent discussions on current
> CEPs. CEP-55(
> https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) and
> CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf)
> In those threads, the syntax for the changes would have been unique to CQL.
> My suggestion was to just use the syntax for similar features in pgSQL.
> Jyothsna is finishing up CEP-52 and the syntax is pgSQL and so anyone using
> that new-to-Cassandra feature will find familiar syntax.
>
> In those threads, it was correctly pointed out that we don’t have any
> agreed upon guidelines. My proposal with Phase 1 is just that. Any new
> feature that is proposed, we defer to the pgsql format whenever possible.
> And no, I'm not proposing we back port anything! (Not going there)
>
> I don’t think that is a CEP, however it does need some formality. Maybe a
> VOTE thread and it’s just policy?
>
>
> Jeff and Dinesh jumped into Phase 2, which is really the fun and
> interesting part. To be clear, I am not proposing we make any changes pre
> Cassandra 6 in this case. And this will be a CEP or two or three.
>
> To directly answer the questions and my first shot at imaging an
> implementation in Phase 2, I think this is a matter of making
> QueryProcessor[1] pluggable. I can’t take credit for this idea. It’s been
> floated a few times, but in this case it might be the best place to start.
> And to answer Jeff, this isn’t transforming CQL. I do think this is a new
> implementation.
>
> Then possibly you could run both CQL and SQL at the same time. It’s just a
> matter of what gets sent down to the storage engine. And then there is
> CEP-39 [2]
>
> Jeff’s point about BOP is really interesting. And let’s not forget about
> CEP-57 that was just proposed. The point being we have a lot of future
> changes that, if everything is aligned, could come together is some
> interesting ways. We have to agree on the directionality as a baseline.
> It’s a strategy, not a plan.
>
> 1 -
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java
> 2 -
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
> 3 -
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfaces
>
> On Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova 
> wrote:
>
> Hey Patrick,
>
> Thanks for starting this discussion.
> I am also curious to read the response to Dinesh’s questions.
>
> Plus I have one to add myself (potentially more after I spend more time on
> this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0
> alpha to review all new not yet released syntax to try to align it with
> SQL? You plan to open a CEP and work on that? Or I misunderstood what you
> suggest?
>
> Best regards,
> Ekaterina
>
> On Fri, 31 Oct 2025 at 18:15, Dinesh Joshi  wrote:
>
>> Thank you Patrick for starting this thread. Your talk was interesting. I
>> want to better understand the nature of compatibility aspect of what you're
>> proposing. Specifically, how do you envision the following scenarios to be
>> supported in this new world –
>>
>> 1. Could an operator enable CQL and SQL simultaneously?
>> 2. Does the user need to pick CQL or SQL at the time of Keyspace creation
>> or can they switch between CQL and SQL on the fly?
>> 3. Would the user be able to read and write to the same Keyspace using
>> both CQL and SQL?
>> 4. Do you envision the user being able to write using CQL and read using
>> SQL?
>>
>> Thanks,
>>
>> Dinesh
>>
>> On Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin 
>> wrote:
>>
>>> Over the last decade, CQL has served Cassandra users well by offering a
>>> familiar SQL-like interface for a distributed data model. However, as the
>>> broader database ecosystem converges on PostgreSQL-style SQL as the de
>>> facto standard for developers, it’s time to consider how Cassandra evolves
>>> to meet developers where they are without losing what makes it unique.
>>>
>>> The great thing about SQL standards is that there are plenty to choose
>>> from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the
>>> industry has coalesced around the PostgreSQL dialect. Products such as AWS
>>> Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others
>>> offering “PostgreSQL-compatible” modes, have validated this direction.
>>> Developers are voting with their implementations. PostgreSQL SQL represents
>>> the lowest cognitive-load interface for application data, as repeatedly
>>> confirmed by developer surveys like Stack Overflow 2025[1].
>>>
>>> What I’m proposing is that we begin

Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread Patrick McFadin
I knew this would be a lot of information to try to convey. I swear it sounded 
amazing in my head :D

Let’s break up the Phase 1 and everything after it conversations. 

The Phase 1 part was in response to some recent discussions on current CEPs. 
CEP-55(https://lists.apache.org/thread/4swcf1n4qm7ps6g4brv2wnrql8n72p61) and 
CEP-52(https://lists.apache.org/thread/8rcp808jb4y2jy2sttkhx0fv71qxnddf) In 
those threads, the syntax for the changes would have been unique to CQL. My 
suggestion was to just use the syntax for similar features in pgSQL. Jyothsna 
is finishing up CEP-52 and the syntax is pgSQL and so anyone using that 
new-to-Cassandra feature will find familiar syntax.

In those threads, it was correctly pointed out that we don’t have any agreed 
upon guidelines. My proposal with Phase 1 is just that. Any new feature that is 
proposed, we defer to the pgsql format whenever possible. And no, I'm not 
proposing we back port anything! (Not going there) 

I don’t think that is a CEP, however it does need some formality. Maybe a VOTE 
thread and it’s just policy? 


Jeff and Dinesh jumped into Phase 2, which is really the fun and interesting 
part. To be clear, I am not proposing we make any changes pre Cassandra 6 in 
this case. And this will be a CEP or two or three. 

To directly answer the questions and my first shot at imaging an implementation 
in Phase 2, I think this is a matter of making QueryProcessor[1] pluggable. I 
can’t take credit for this idea. It’s been floated a few times, but in this 
case it might be the best place to start. And to answer Jeff, this isn’t 
transforming CQL. I do think this is a new implementation. 

Then possibly you could run both CQL and SQL at the same time. It’s just a 
matter of what gets sent down to the storage engine. And then there is CEP-39 
[2]

Jeff’s point about BOP is really interesting. And let’s not forget about CEP-57 
that was just proposed. The point being we have a lot of future changes that, 
if everything is aligned, could come together is some interesting ways. We have 
to agree on the directionality as a baseline. It’s a strategy, not a plan. 

1 - 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/QueryProcessor.java
2 - 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
3 - 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-57%3A+Flat+keys+and+trie+interfaces

> On Oct 31, 2025, at 3:22 PM, Ekaterina Dimitrova  
> wrote:
> 
> Hey Patrick,
> 
> Thanks for starting this discussion. 
> I am also curious to read the response to Dinesh’s questions.
> 
> Plus I have one to add myself (potentially more after I spend more time on 
> this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0 
> alpha to review all new not yet released syntax to try to align it with SQL? 
> You plan to open a CEP and work on that? Or I misunderstood what you suggest?
> 
> Best regards,
> Ekaterina
> 
> On Fri, 31 Oct 2025 at 18:15, Dinesh Joshi  > wrote:
>> Thank you Patrick for starting this thread. Your talk was interesting. I 
>> want to better understand the nature of compatibility aspect of what you're 
>> proposing. Specifically, how do you envision the following scenarios to be 
>> supported in this new world –
>> 
>> 1. Could an operator enable CQL and SQL simultaneously?
>> 2. Does the user need to pick CQL or SQL at the time of Keyspace creation or 
>> can they switch between CQL and SQL on the fly?
>> 3. Would the user be able to read and write to the same Keyspace using both 
>> CQL and SQL?
>> 4. Do you envision the user being able to write using CQL and read using SQL?
>> 
>> Thanks,
>> 
>> Dinesh
>> 
>> On Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin > > wrote:
>>> Over the last decade, CQL has served Cassandra users well by offering a 
>>> familiar SQL-like interface for a distributed data model. However, as the 
>>> broader database ecosystem converges on PostgreSQL-style SQL as the de 
>>> facto standard for developers, it’s time to consider how Cassandra evolves 
>>> to meet developers where they are without losing what makes it unique.
>>> 
>>> The great thing about SQL standards is that there are plenty to choose 
>>> from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the 
>>> industry has coalesced around the PostgreSQL dialect. Products such as AWS 
>>> Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others 
>>> offering “PostgreSQL-compatible” modes, have validated this direction. 
>>> Developers are voting with their implementations. PostgreSQL SQL represents 
>>> the lowest cognitive-load interface for application data, as repeatedly 
>>> confirmed by developer surveys like Stack Overflow 2025[1]. 
>>> 
>>> What I’m proposing is that we begin to normalize the frontend to expand 
>>> access to our extraordinary backend. The key principle here is ADD, not 
>>> DELETE. CQL

Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread Jeff Jirsa
Is the best path there really trying to transform CQL into SQL? Or is it 
building a native SQL implementation stateless on top of a backing ordered 
(ByteOrderedPartitioner), transactional (accord), key-value cassandra cluster ? 
It’s an extra hop, but trying to adjust the existing grammar / DDL to fit into 
a language it always mimicked but never implemented faithfully feels like a 
bumpy road, where there are many successful existence proofs for building it 
stateless a layer above. 





> On Oct 31, 2025, at 1:25 PM, Patrick McFadin  wrote:
> 
> Over the last decade, CQL has served Cassandra users well by offering a 
> familiar SQL-like interface for a distributed data model. However, as the 
> broader database ecosystem converges on PostgreSQL-style SQL as the de facto 
> standard for developers, it’s time to consider how Cassandra evolves to meet 
> developers where they are without losing what makes it unique.
> 
> The great thing about SQL standards is that there are plenty to choose from. 
> While the formal SQL:2023 specification (ISO/IEC 9075) exists, the industry 
> has coalesced around the PostgreSQL dialect. Products such as AWS Aurora, 
> AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others offering 
> “PostgreSQL-compatible” modes, have validated this direction. Developers are 
> voting with their implementations. PostgreSQL SQL represents the lowest 
> cognitive-load interface for application data, as repeatedly confirmed by 
> developer surveys like Stack Overflow 2025[1]. 
> 
> What I’m proposing is that we begin to normalize the frontend to expand 
> access to our extraordinary backend. The key principle here is ADD, not 
> DELETE. CQL continues to work and be supported while we expand Cassandra’s 
> capabilities through SQL compatibility, providing a familiar syntax and 
> potentially supporting a larger ecosystem (JDBC, etc.).
> 
> Phase 1 (Before Cassandra 6) - Stop Digging
> Freeze CQL at version 3 and align all new syntax or features (DML/DDL) to the 
> PostgreSQL SQL dialect wherever possible. This approach was already 
> demonstrated with CEP-52 and should become our norm.
> 
> Phase 2 (Years) - Create Parallel Paths
> This is where we take our time and do things carefully, most likely over a 
> series of years.  Don’t touch the CQL path. Add an opt-in, feature flag path 
> for SQL-only that conforms to the PostgreSQL SQL dialect. Begin our journey 
> to feature compatibility here. At Community over Code this year, Alex Petrov 
> and I sat in Aaron Ploetz’s kitchen (thanks for dinner, Aaron!) and 
> brainstormed how this could work. The two critical aspects to manage are 
> types and functionality. We may never be able to support everything, but 
> given what this project has accomplished over the years, I wouldn’t bet on 
> it. Being clear about the differences early on can serve as a roadmap for 
> future contributors who want to be involved. 
> 
> In discussion with Joel Shepherd on this topic, he sagely suggested some 
> sub-steps inside this phase:
> 
> 1 - Prioritize SQL that is compatible to get the incremental wins and early 
> feedback from the user community.
> 2 - Tackle the non-compatible and triage for the long-term changes that would 
> need to happen. 
> I took the time to do some rough mapping of syntax, features, and types:
> 
> Function and Feature Compatibility tables: 
> https://docs.google.com/document/d/1K2-GKVM4Z_u1Hb1GtdrRyC9AdDN3RLwJ7LX_i_PqkOE/edit?usp=sharing
> 
> Typing differences: 
> https://docs.google.com/spreadsheets/d/11tWkyCQ8WAFGnd5Va6iyltkp1wbKdAubxH9o_ZyJEtk/edit?usp=sharing
> 
> Phase 3 (Indefinite timeframe)– Become Default SQL
> Once the SQL path achieves sufficient coverage and confidence, we can make it 
> the default frontend, with CQL continuing to be supported indefinitely. The 
> intent is not replacement but evolution toward broader accessibility.
> 
> This proposal is an invitation for discussion. Feedback from contributors, 
> driver maintainers, and downstream users will guide the roadmap and 
> priorities. The result will be the creation of CEPs as needed. If we get this 
> right, Cassandra’s next decade will be defined by reach, compatibility, and 
> continued excellence in scalability.
> 
> If you saw my talk in Minneapolis[2], you know I've been thinking about what 
> we can accomplish in 10 years. The Phase 1 piece is near-term, but no 
> timeframe for everything else. The best consensus I can hope for today is on 
> directionality, and that starts with phase 1. 
> 
> Patrick
> 
> 1 - 
> https://survey.stackoverflow.co/2025/technology#most-popular-technologies-database-prof
> 2 - https://youtu.be/rIh968dSlkQ



Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread Ekaterina Dimitrova
Hey Patrick,

Thanks for starting this discussion.
I am also curious to read the response to Dinesh’s questions.

Plus I have one to add myself (potentially more after I spend more time on
this) - It is not clear to me what Phase 1 is. Do you suggest blocking 6.0
alpha to review all new not yet released syntax to try to align it with
SQL? You plan to open a CEP and work on that? Or I misunderstood what you
suggest?

Best regards,
Ekaterina

On Fri, 31 Oct 2025 at 18:15, Dinesh Joshi  wrote:

> Thank you Patrick for starting this thread. Your talk was interesting. I
> want to better understand the nature of compatibility aspect of what you're
> proposing. Specifically, how do you envision the following scenarios to be
> supported in this new world –
>
> 1. Could an operator enable CQL and SQL simultaneously?
> 2. Does the user need to pick CQL or SQL at the time of Keyspace creation
> or can they switch between CQL and SQL on the fly?
> 3. Would the user be able to read and write to the same Keyspace using
> both CQL and SQL?
> 4. Do you envision the user being able to write using CQL and read using
> SQL?
>
> Thanks,
>
> Dinesh
>
> On Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin 
> wrote:
>
>> Over the last decade, CQL has served Cassandra users well by offering a
>> familiar SQL-like interface for a distributed data model. However, as the
>> broader database ecosystem converges on PostgreSQL-style SQL as the de
>> facto standard for developers, it’s time to consider how Cassandra evolves
>> to meet developers where they are without losing what makes it unique.
>>
>> The great thing about SQL standards is that there are plenty to choose
>> from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the
>> industry has coalesced around the PostgreSQL dialect. Products such as AWS
>> Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others
>> offering “PostgreSQL-compatible” modes, have validated this direction.
>> Developers are voting with their implementations. PostgreSQL SQL represents
>> the lowest cognitive-load interface for application data, as repeatedly
>> confirmed by developer surveys like Stack Overflow 2025[1].
>>
>> What I’m proposing is that we begin to normalize the frontend to expand
>> access to our extraordinary backend. The key principle here is ADD, not
>> DELETE. CQL continues to work and be supported while we expand Cassandra’s
>> capabilities through SQL compatibility, providing a familiar syntax and
>> potentially supporting a larger ecosystem (JDBC, etc.).
>>
>> Phase 1 (Before Cassandra 6) - Stop Digging
>> Freeze CQL at version 3 and align all new syntax or features (DML/DDL) to
>> the PostgreSQL SQL dialect wherever possible. This approach was already
>> demonstrated with CEP-52 and should become our norm.
>>
>> Phase 2 (Years) - Create Parallel Paths
>> This is where we take our time and do things carefully, most likely over
>> a series of years.  Don’t touch the CQL path. Add an opt-in, feature flag
>> path for SQL-only that conforms to the PostgreSQL SQL dialect. Begin our
>> journey to feature compatibility here. At Community over Code this year,
>> Alex Petrov and I sat in Aaron Ploetz’s kitchen (thanks for dinner, Aaron!)
>> and brainstormed how this could work. The two critical aspects to manage
>> are types and functionality. We may never be able to support everything,
>> but given what this project has accomplished over the years, I wouldn’t bet
>> on it. Being clear about the differences early on can serve as a roadmap
>> for future contributors who want to be involved.
>>
>> In discussion with Joel Shepherd on this topic, he sagely suggested some
>> sub-steps inside this phase:
>>
>> 1 - Prioritize SQL that is compatible to get the incremental wins and
>> early feedback from the user community.
>> 2 - Tackle the non-compatible and triage for the long-term changes that
>> would need to happen.
>> I took the time to do some rough mapping of syntax, features, and types:
>>
>> Function and Feature Compatibility tables:
>> https://docs.google.com/document/d/1K2-GKVM4Z_u1Hb1GtdrRyC9AdDN3RLwJ7LX_i_PqkOE/edit?usp=sharing
>>
>> Typing differences:
>> https://docs.google.com/spreadsheets/d/11tWkyCQ8WAFGnd5Va6iyltkp1wbKdAubxH9o_ZyJEtk/edit?usp=sharing
>>
>> Phase 3 (Indefinite timeframe)– Become Default SQL
>> Once the SQL path achieves sufficient coverage and confidence, we can
>> make it the default frontend, with CQL continuing to be supported
>> indefinitely. The intent is not replacement but evolution toward broader
>> accessibility.
>>
>> This proposal is an invitation for discussion. Feedback from
>> contributors, driver maintainers, and downstream users will guide the
>> roadmap and priorities. The result will be the creation of CEPs as needed.
>> If we get this right, Cassandra’s next decade will be defined by reach,
>> compatibility, and continued excellence in scalability.
>>
>> If you saw my talk in Minneapolis[2], you know I've been thinking ab

Re: [DISCUSS] SQL support in Cassandra

2025-10-31 Thread Dinesh Joshi
Thank you Patrick for starting this thread. Your talk was interesting. I
want to better understand the nature of compatibility aspect of what you're
proposing. Specifically, how do you envision the following scenarios to be
supported in this new world –

1. Could an operator enable CQL and SQL simultaneously?
2. Does the user need to pick CQL or SQL at the time of Keyspace creation
or can they switch between CQL and SQL on the fly?
3. Would the user be able to read and write to the same Keyspace using both
CQL and SQL?
4. Do you envision the user being able to write using CQL and read using
SQL?

Thanks,

Dinesh

On Fri, Oct 31, 2025 at 1:26 PM Patrick McFadin  wrote:

> Over the last decade, CQL has served Cassandra users well by offering a
> familiar SQL-like interface for a distributed data model. However, as the
> broader database ecosystem converges on PostgreSQL-style SQL as the de
> facto standard for developers, it’s time to consider how Cassandra evolves
> to meet developers where they are without losing what makes it unique.
>
> The great thing about SQL standards is that there are plenty to choose
> from. While the formal SQL:2023 specification (ISO/IEC 9075) exists, the
> industry has coalesced around the PostgreSQL dialect. Products such as AWS
> Aurora, AlloyDB, CockroachDB, YugabyteDB, and DuckDB, and many others
> offering “PostgreSQL-compatible” modes, have validated this direction.
> Developers are voting with their implementations. PostgreSQL SQL represents
> the lowest cognitive-load interface for application data, as repeatedly
> confirmed by developer surveys like Stack Overflow 2025[1].
>
> What I’m proposing is that we begin to normalize the frontend to expand
> access to our extraordinary backend. The key principle here is ADD, not
> DELETE. CQL continues to work and be supported while we expand Cassandra’s
> capabilities through SQL compatibility, providing a familiar syntax and
> potentially supporting a larger ecosystem (JDBC, etc.).
>
> Phase 1 (Before Cassandra 6) - Stop Digging
> Freeze CQL at version 3 and align all new syntax or features (DML/DDL) to
> the PostgreSQL SQL dialect wherever possible. This approach was already
> demonstrated with CEP-52 and should become our norm.
>
> Phase 2 (Years) - Create Parallel Paths
> This is where we take our time and do things carefully, most likely over a
> series of years.  Don’t touch the CQL path. Add an opt-in, feature flag
> path for SQL-only that conforms to the PostgreSQL SQL dialect. Begin our
> journey to feature compatibility here. At Community over Code this year,
> Alex Petrov and I sat in Aaron Ploetz’s kitchen (thanks for dinner, Aaron!)
> and brainstormed how this could work. The two critical aspects to manage
> are types and functionality. We may never be able to support everything,
> but given what this project has accomplished over the years, I wouldn’t bet
> on it. Being clear about the differences early on can serve as a roadmap
> for future contributors who want to be involved.
>
> In discussion with Joel Shepherd on this topic, he sagely suggested some
> sub-steps inside this phase:
>
> 1 - Prioritize SQL that is compatible to get the incremental wins and
> early feedback from the user community.
> 2 - Tackle the non-compatible and triage for the long-term changes that
> would need to happen.
> I took the time to do some rough mapping of syntax, features, and types:
>
> Function and Feature Compatibility tables:
> https://docs.google.com/document/d/1K2-GKVM4Z_u1Hb1GtdrRyC9AdDN3RLwJ7LX_i_PqkOE/edit?usp=sharing
>
> Typing differences:
> https://docs.google.com/spreadsheets/d/11tWkyCQ8WAFGnd5Va6iyltkp1wbKdAubxH9o_ZyJEtk/edit?usp=sharing
>
> Phase 3 (Indefinite timeframe)– Become Default SQL
> Once the SQL path achieves sufficient coverage and confidence, we can make
> it the default frontend, with CQL continuing to be supported indefinitely.
> The intent is not replacement but evolution toward broader accessibility.
>
> This proposal is an invitation for discussion. Feedback from contributors,
> driver maintainers, and downstream users will guide the roadmap and
> priorities. The result will be the creation of CEPs as needed. If we get
> this right, Cassandra’s next decade will be defined by reach,
> compatibility, and continued excellence in scalability.
>
> If you saw my talk in Minneapolis[2], you know I've been thinking about
> what we can accomplish in 10 years. The Phase 1 piece is near-term, but no
> timeframe for everything else. The best consensus I can hope for today is
> on directionality, and that starts with phase 1.
>
> Patrick
>
> 1 -
> https://survey.stackoverflow.co/2025/technology#most-popular-technologies-database-prof
> 2 - https://youtu.be/rIh968dSlkQ
>