Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi, To add some additional context. The row cache is disabled by default and it is already pluggable, but there isn’t a Caffeine implementation present. I think one used to exist and could be resurrected. I personally also think that people should be able to scratch their own itch row cache

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread guo Maxwell
+1 too Mick Semb Wever 于2023年12月15日周五 10:01写道: > > > >> >> Are there objections to making this switch and adding a new dependency? >> >> [1] https://github.com/apache/cassandra/pull/2842/files >> [2] https://issues.apache.org/jira/browse/CASSANDRA-16565 >> > > > > +1 to removing sigar and to

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jon Haddad
I think we should probably figure out how much value it actually provides by getting some benchmarks around a few use cases along with some profiling. tlp-stress has a --rowcache flag that I added a while back to be able to do this exact test. I was looking for a use case to profile and write up

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread Mick Semb Wever
> > Are there objections to making this switch and adding a new dependency? > > [1] https://github.com/apache/cassandra/pull/2842/files > [2] https://issues.apache.org/jira/browse/CASSANDRA-16565 > +1 to removing sigar and to add oshi-core

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Mick Semb Wever
I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > 3.

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 5:35 PM, Paulo Motta wrote: > > This could be a potential hook for out-of-process caching. > > Would something like this be valuable/feasible? It is certainly feasible. I am not sure about its value. Dinesh

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Paulo Motta
I like Dinesh's middle ground proposal, since this feature has valid uses. I'm not familiar with the row caching module, but would it make sense to take this opportunity to expose this feature as an optional Row Caching Module, disabled by default with an optional on-heap Caffeine implementation?

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
I would avoid taking away a feature even if it works in narrow set of use-cases. I would instead suggest - 1. Leave it disabled by default. 2. Detect when Row Cache has a low hit rate and warn the operator to turn it off. Cassandra should ideally detect this and do it automatically. 3. Move to

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Mick Semb Wever
> > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in > a later release > I'm for deprecating and removing it. It constantly trips users up and just causes pain. Yes it works in some very narrow situations, but those situations often change over time and again just

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jeff Jirsa
> On Dec 14, 2023, at 1:51 PM, Dinesh Joshi wrote: > >  >> >> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg wrote: >> >> 1. Fork OHC and start publishing under a new package name and continue to >> use it > > Who would fork it? Where would you fork it? My first instinct is that this >

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg wrote: > > 1. Fork OHC and start publishing under a new package name and continue to use > it Who would fork it? Where would you fork it? My first instinct is that this would not be viable path forward. > 2. Replace OHC with a different cache

Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi, Now seems like a good time to discuss the future direction of the row cache and its only implementation OHC (https://github.com/snazy/ohc). OHC is currently unmaintained and we don’t have the ability to release maven artifacts for it or commit to the original repo. I have reached out to

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Jeff Jirsa
I'm also torn on the CEP as presented. I think some of it is my negative emotional response to the examples - e.g. I've literally never seen a real use case where unfolding constants matters, and I'm trying to convince myself to read past that. I also cant tell what exactly you mean when you say

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > So yes, this physical plan is the structure that you have in mind but > the idea of sharing it is not part of the CEP. Sorry, Benedict, what I meant by sharing was sharing across the nodes. It is an integral part of the optimizer API that the CEP talks about as it represents its output. I

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
Fwiw Chris, I agree with your concerns, but I think the introduction of a CBO - done right - is in principle a good thing in its own right. It’s independent of the issues you mention, even if it might enable features that exacerbate them.It should also help enable secondary indexes work better,

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Chris Lohfink
I don't wanna be a blocker for this CEP or anything but did want to put my 2 cents in. This CEP is horrifying to me. I have seen thousands of clusters across multiple companies and helped them get working successfully. A vast majority of that involved blocking the use of MVs, GROUP BY, secondary

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
> I think it should be. This should part of the API on which any CBO is built.To expand on this a bit: one of the stated goals of the CEP is to support multiple CBOs, and this is a required component of any CBO. If this doesn’t form part of the shared machinery, we aren’t really enabling new CBOs,

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
> So yes, this physical plan is the structure that you have in mind but the idea of sharing it is not part of the CEP. I think it should be. This should form a major part of the API on which any CBO is built. > It seems that there is a difference between the goal of your proposal and the one of

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
The binding of the parser output to the schema (what is today the Raw.prepare call) will create the logical plan, expressed as a tree of relational operators. Simplification and normalization will happen on that tree to produce a new equivalent logical plan. That logical plan will be used as input

Re: [DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread Miklosovic, Stefan via dev
For completeness, there is this thread (1) where we already decided that sigar is OK to be removed completely. I think that OSHI is way better lib to have, I am +1 on this proposal. Currently the deal seems to be that this will go just to trunk. (1)

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benedict
There surely needs to be a more succinct and abstract representation in order to perform transformations on the query plan? You don’t intend to manipulate the object graph directly as you apply any transformations when performing simplification or cost based analysis? This would also (I expect) be

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > I mean that an important part of this work - not specified in the CEP > (AFAICT) - should probably be to define some standard execution model, that > we can manipulate and serialise, for use across (and without) optimisers. I am confused because for me an execution model defines how

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-14 Thread Claude Warren
Is there still interest in this? Can we get some points down on electrons so that we all understand the issues? While it is fairly simple to redirect the read/write to something other than the local system for a single node this will not solve the problem for tiered storage. Tiered storage

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Benjamin Lerer
> > Can you share the reasons why Apache Calcite is not suitable for this case > and why it was rejected My understanding is that Calcite was made for two main things: to help with optimizing SQL-like languages and to let people query different kinds of data sources together. We could think

[DISCUSS] Replace Sigar with OSHI (CASSANDRA-16565)

2023-12-14 Thread Claude Warren, Jr via dev
Greetings, I have submitted a pull request[1] that replaces the unsupported Sigar library with the maintained OSHI library. OSHI is an MIT licensed library that provides information about the underlying OS much like Sigar did. The change adds a dependency on oshi-core at the following