Re: [DISCUSS] CEP-39: Cost Based Optimizer

2024-01-02 Thread Benedict
The CEP expressly includes an item for coordinated cardinality estimation, by producing whole cluster summaries. I’m not sure if you addressed this in your feedback, it’s not clear what you’re referring to with distributed estimates, but avoiding this was expressly the driver of my suggestion

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2024-01-02 Thread Ariel Weisberg
Hi, I am burying the lede, but it's important to keep an eye on runtime-adaptive vs planning time optimization as the cost/benefits vary greatly between the two and runtime adaptive can be a game changer. Basically CBO optimizes for query efficiency and startup time at the expense of not

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-22 Thread Josh McKenzie
t;>> I am also intrigued by this proposal when I think about multi tenancy and >>>> resource governance: We have heard from several operator who run multiple >>>> internal teams on the same Cassandra cluster jut to optimize costs. Having >>>> a way to at

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-22 Thread J. D. Jordan
ember 20, 2023 8:15 AM To: dev@cassandra.apache.org <dev@cassandra.apache.org> Cc: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: [EXTERNAL] Re: [DISCUSS] CEP-39: Cost Based Optimizer You don't often get email from sc...@paradoxica.net. Learn why this is importantThanks for

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-22 Thread Benedict
Close enough, though that’s not quite how I would characterise it. But none of your problems are inherent?- Clients can re-prepare whenever they want- Clusters can suggest to clients to re-prepare, should we desire this feature. Or we could permit the cluster to invalidate stale preparations since

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Caleb Rackliffe
I think I hinted at this in my first response, but just to clarify, I would be interested to see this work broken up as much as possible into a.) the set of things we can do without coordinator involvement (statistical optimization for index and filtering queries) and b.) the set of things where

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Caleb Rackliffe
What would the relationship between our present query tracing apparatus and EXPLAIN ANALYZE look like? On Thu, Dec 21, 2023 at 4:24 PM Caleb Rackliffe wrote: > > We are also currently working on some SAI features that need cost based > optimization. > > I don't even think we have to think about

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Caleb Rackliffe
> We are also currently working on some SAI features that need cost based optimization. I don't even think we have to think about *new* SAI features to see where it will benefit from further *local* optimization, and I'm sympathetic to that happening in the context of a larger framework, as long

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Josh McKenzie
or problem >> and do more intelligent request throttling. >> >> In summary I support the proposal with the caveats raised above. >> >> Thanks, >> German >> >> >> *From:* C. Scott Andreas >> *Sent:* Wednesday, December 20,

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Benjamin Lerer
or > problem and do more intelligent request throttling. > > In summary I support the proposal with the caveats raised above. > > Thanks, > German > > -- > *From:* C. Scott Andreas > *Sent:* Wednesday, December 20, 2023 8:15 AM > *To:*

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-21 Thread Benjamin Lerer
Hi Scott, Thanks for your feedback. If I am not mistaken the main concern in your email is that without features that will heavily benefit from the optimizer, this work will not bring much. You are, therefore, under the impression that this work is one or two years early. In my perspective, we

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread Benjamin Lerer
> > Pick a random replica and ask it to prepare it, and use that. This is > probably fine, unless there is significant skew (in which case, our plan is > likely to bad whatever, somewhere) To be sure that I understand you correctly. What you are suggesting is to use local statistics to compute

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread C. Scott Andreas
Thanks for this proposal and apologies for my delayed engagement during the Cassandra Summit last week. Benjamin, I appreciate your work on this and your engagement on this thread – I know it’s a lot of discussion to field.On ALLOW FILTERING:I share Chris Lohfink’s experience in operating clusters

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread C. Scott Andreas
Thanks for this proposal and apologies for my delayed engagement during the Cassandra Summit last week. Benjamin, I appreciate your work on this and your engagement on this thread – I know it’s a lot of discussion to field. On ALLOW FILTERING: I share Chris Lohfink’s experience in operating

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread Benedict
I see three options:Pick a random replica and ask it to prepare it, and use that. This is probably fine, unless there is significant skew (in which case, our plan is likely to bad whatever, somewhere)If there already exists a plan for the query, return thatPick a sample of replicas and eitherAsk

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread Benjamin Lerer
> > If we are to address that within the CEP itself then we should discuss it > here, as I would like to fully understand the approach as well as how it > relates to consistency of execution and the idea of triggering > re-optimisation. Sure, that was my plan. I’m not sold on the proposed set

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread Benedict
If we are to address that within the CEP itself then we should discuss it here, as I would like to fully understand the approach as well as how it relates to consistency of execution and the idea of triggering re-optimisation. These ideas are all interrelated.I’m not sold on the proposed set of

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-20 Thread Benjamin Lerer
After the second phase of the CEP, we will have two optimizer implementations. One will be similar to what we have today and the other one will be the CBO. As those implementations will be behind the new Optimizer API interfaces they will both have support for EXPLAIN and they will both benefit

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-19 Thread David Capwell
> even if the only outcome of all this work were to tighten up inconsistencies > in our grammar and provide more robust EXPLAIN and EXPLAIN ANALYZE > functionality to our end users, I think that would be highly valuable In my mental model a no-op optimizer just becomes what we have today (since

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-15 Thread Josh McKenzie
> Goals > • Introduce a Cascades(2) query optimizer with rules easily extendable > • Improve query performance for most common queries > • Add support for EXPLAIN and EXPLAIN ANALYZE to help with query > optimization and troubleshooting > • Lay the groundwork for the addition of features

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-15 Thread Chris Lohfink
Thanks for time in addressing concerns. At least with initial versions, as long as there is a way to replace it with noop or disable it I would be happy. This is pretty standard practice with features nowadays but I wanted to highlight it as this might require some pretty tight coupling. Chris

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-15 Thread Benjamin Lerer
> > I'm also torn on the CEP as presented. I think some of it is my negative > emotional response to the examples - e.g. I've literally never seen a real > use case where unfolding constants matters, and I'm trying to convince > myself to read past that. I totally agree with you, Jeff, if you

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-15 Thread Benjamin Lerer
Hey Chris, You raise some valid points. I believe that there are 3 points that you mentioned: 1) CQL restrictions are some form of safety net and should be kept 2) A lot of Cassandra features do not scale and/or are too easy to use in a wrong way that can make the whole system collapse. We should

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Jeff Jirsa
I'm also torn on the CEP as presented. I think some of it is my negative emotional response to the examples - e.g. I've literally never seen a real use case where unfolding constants matters, and I'm trying to convince myself to read past that. I also cant tell what exactly you mean when you say

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > So yes, this physical plan is the structure that you have in mind but > the idea of sharing it is not part of the CEP. Sorry, Benedict, what I meant by sharing was sharing across the nodes. It is an integral part of the optimizer API that the CEP talks about as it represents its output. I

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
Fwiw Chris, I agree with your concerns, but I think the introduction of a CBO - done right - is in principle a good thing in its own right. It’s independent of the issues you mention, even if it might enable features that exacerbate them.It should also help enable secondary indexes work better,

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Chris Lohfink
I don't wanna be a blocker for this CEP or anything but did want to put my 2 cents in. This CEP is horrifying to me. I have seen thousands of clusters across multiple companies and helped them get working successfully. A vast majority of that involved blocking the use of MVs, GROUP BY, secondary

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
> I think it should be. This should part of the API on which any CBO is built.To expand on this a bit: one of the stated goals of the CEP is to support multiple CBOs, and this is a required component of any CBO. If this doesn’t form part of the shared machinery, we aren’t really enabling new CBOs,

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
> So yes, this physical plan is the structure that you have in mind but the idea of sharing it is not part of the CEP. I think it should be. This should form a major part of the API on which any CBO is built. > It seems that there is a difference between the goal of your proposal and the one of

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
The binding of the parser output to the schema (what is today the Raw.prepare call) will create the logical plan, expressed as a tree of relational operators. Simplification and normalization will happen on that tree to produce a new equivalent logical plan. That logical plan will be used as input

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
There surely needs to be a more succinct and abstract representation in order to perform transformations on the query plan? You don’t intend to manipulate the object graph directly as you apply any transformations when performing simplification or cost based analysis? This would also (I expect) be

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > I mean that an important part of this work - not specified in the CEP > (AFAICT) - should probably be to define some standard execution model, that > we can manipulate and serialise, for use across (and without) optimisers. I am confused because for me an execution model defines how

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > Can you share the reasons why Apache Calcite is not suitable for this case > and why it was rejected My understanding is that Calcite was made for two main things: to help with optimizing SQL-like languages and to let people query different kinds of data sources together. We could think

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread Benedict
> Is it for you a blocker for this CEP or do you just want to make sure that this part is discussed in deeper details before we implement it? My concept of a CEP is that we explore enough of the design detail to have a shared understanding of how things should work before implementation begins. If

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread Benjamin Lerer
One thing that I did not mention is the fact that this CEP is only a high level proposal. There will be deeper discussions on the dev list around the different parts of this proposal when we reach those parts and have enough details to make those discussions more meaningful. > The maintenance

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread guo Maxwell
calcite has it‘s own way of sql parser which used javacc that is different from antlr. though I think calcite is a very good sql framework in my opinion. We have tried to use calcite to develop our own sql which can both support cql ,and some normal sql like presto sql ,also we want to take

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread Maxim Muzafarov
Hello Benjamin, Can you share the reasons why Apache Calcite is not suitable for this case and why it was rejected? It has custom syntax support, CBO, so I am interested to see some technical details in the "Rejected Alternatives" section, I'm pretty sure they exist, but they weren't mentioned

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-13 Thread Benedict
A CBO can only make worse decisions than the status quo for what I presume are the majority of queries - i.e. those that touch only primary indexes. In general, there are plenty of use cases that prefer determinism. So I agree that there should at least be a CBO implementation that makes the same

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread Jon Haddad
I think it makes sense to see what the actual overhead is of CBO before making the assumption it'll be so high that we need to have two code paths. I'm happy to provide thorough benchmarking and analysis when it reaches a testing phase. I'm excited to see where this goes. I think it sounds very

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread guo Maxwell
Nothing expresses my thoughts better than +1 ,It feels like it means a lot to Cassandra. I have a question. Is it easy to turn off cbo's optimizer or by pass in some way? Because some simple read and write requests will have better performance without cbo, which is also the advantage of Cassandra

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread David Capwell
Overall LGTM. > On Dec 12, 2023, at 5:29 AM, Benjamin Lerer wrote: > > Hi everybody, > > I would like to open the discussion on the introduction of a cost based > optimizer to allow Cassandra to pick the best execution plan based on the > data distribution.Therefore, improving the overall

[DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread Benjamin Lerer
Hi everybody, I would like to open the discussion on the introduction of a cost based optimizer to allow Cassandra to pick the best execution plan based on the data distribution.Therefore, improving the overall query performance. This CEP should also lay the groundwork for the future addition of