Re: Future direction for the row cache and OHC implementation

2023-12-20 Thread German Eichberger via dev
Hi,

we once did some extensive performance testing on the row cache (motivated by 
some hardware accelerator we were hoping to introduce)  but could only find 
improvements in highly contrived scenarios - has been a while since then so 
fresh eyes are good but I think we will still arrive at the conclusion to 
deprecate the row cache.

Thanks,
German

From: Jon Haddad 
Sent: Monday, December 18, 2023 10:31 AM
To: dev@cassandra.apache.org 
Subject: [EXTERNAL] Re: Future direction for the row cache and OHC 
implementation

You don't often get email from j...@jonhaddad.com. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Sure, I’d love to work with you on this.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com<http://rustyrazorblade.com/>


On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg 
mailto:ar...@weisberg.ws>> wrote:
Hi,

Thanks for the generous offer. Before you do that can you give me a chance to 
add back support for Caffeine for the row cache so you can test the option of 
switching back to an on-heap row cache?

Ariel

On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
I think we should probably figure out how much value it actually provides by 
getting some benchmarks around a few use cases along with some profiling.  
tlp-stress has a --rowcache flag that I added a while back to be able to do 
this exact test.  I was looking for a use case to profile and write up so this 
is actually kind of perfect for me.  I can take a look in January when I'm back 
from the holidays.

Jon

On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever 
mailto:m...@apache.org>> wrote:



I would avoid taking away a feature even if it works in narrow set of 
use-cases. I would instead suggest -

1. Leave it disabled by default.
2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
off. Cassandra should ideally detect this and do it automatically.
3. Move to Caffeine instead of OHC.

I would suggest having this as the middle ground.



Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, 
hard value when to disable.



Re: Future direction for the row cache and OHC implementation

2023-12-18 Thread Jon Haddad
Sure, I’d love to work with you on this.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg  wrote:

> Hi,
>
> Thanks for the generous offer. Before you do that can you give me a chance
> to add back support for Caffeine for the row cache so you can test the
> option of switching back to an on-heap row cache?
>
> Ariel
>
> On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
>
> I think we should probably figure out how much value it actually provides
> by getting some benchmarks around a few use cases along with some
> profiling.  tlp-stress has a --rowcache flag that I added a while back to
> be able to do this exact test.  I was looking for a use case to profile and
> write up so this is actually kind of perfect for me.  I can take a look in
> January when I'm back from the holidays.
>
> Jon
>
> On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:
>
>
>
>
> I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>
>
>
>
> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to
> warn, hard value when to disable.
>
>
>


Re: Future direction for the row cache and OHC implementation

2023-12-18 Thread Ariel Weisberg
Hi,

Thanks for the generous offer. Before you do that can you give me a chance to 
add back support for Caffeine for the row cache so you can test the option of 
switching back to an on-heap row cache?

Ariel

On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
> I think we should probably figure out how much value it actually provides by 
> getting some benchmarks around a few use cases along with some profiling.  
> tlp-stress has a --rowcache flag that I added a while back to be able to do 
> this exact test.  I was looking for a use case to profile and write up so 
> this is actually kind of perfect for me.  I can take a look in January when 
> I'm back from the holidays.
> 
> Jon
> 
> On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:
>>
>>
>> 
>>> I would avoid taking away a feature even if it works in narrow set of 
>>> use-cases. I would instead suggest -
>>> 
>>> 1. Leave it disabled by default.
>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>> it off. Cassandra should ideally detect this and do it automatically.
>>> 3. Move to Caffeine instead of OHC.
>>> 
>>> I would suggest having this as the middle ground.
>> 
>> 
>>  
>> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, 
>> hard value when to disable.


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Josh McKenzie
Gotcha; wasn't sure given the earlier phrasing. Makes sense.

Dinesh's compromise position makes sense to me.

On Fri, Dec 15, 2023, at 11:21 PM, Ariel Weisberg wrote:
> Hi,
> 
> I did get one response from Robert indicating that he didn’t want to do the 
> work to contribute it.
> 
> I offered to do the work and asked for permission to contribute it and no 
> response. Followed up later with a ping and also no response.
> 
> Ariel
> 
> On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote:
>>> I have reached out to the original maintainer about it and it seems like if 
>>> we want to keep using it we will need to start releasing it under a new 
>>> package from a different repo.
>> 
>>> the current maintainer is not interested in donating it to the ASF
>> Is that the case Ariel or could you just not reach Robert?
>> 
>> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
 from a maintenance and
 integration testing perspective I think it would be better to keep the
 ohc in-tree, so we will be aware of any issues immediately after the
 full CI run.
>>> 
>>> From the original email bringing OHC in tree is not an option because the 
>>> current maintainer is not interested in donating it to the ASF.  Thus the 
>>> option 1 of some set of people forking it to their own github org and 
>>> maintaining a version outside of the ASF C* project.
>>> 
>>> -Jeremiah
>>> 
>>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
 Ariel,
 thank you for bringing this topic to the ML.
 
 I may be missing something, so correct me if I'm wrong somewhere in
 the management of the Cassandra ecosystem.  As I see it, the problem
 right now is that if we fork the ohc and put it under its own root,
 the use of that row cache is still not well tested (the same as it is
 now). I am particularly emphasising the dependency management side, as
 any version change/upgrade in Cassandra and, as a result of that
 change a new set of libraries in the classpath should be tested
 against this integration.
 
 So, unless it is being widely used by someone else outside of the
 community (which it doesn't seem to be), from a maintenance and
 integration testing perspective I think it would be better to keep the
 ohc in-tree, so we will be aware of any issues immediately after the
 full CI run.
 
 I'm also +1 for not deprecating it, even if it is used in narrow
 cases, while the cost of maintaining its source code remains quite low
 and it brings some benefits.
 
 On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
> 
> Hi,
> 
> To add some additional context.
> 
> The row cache is disabled by default and it is already pluggable, but 
> there isn’t a Caffeine implementation present. I think one used to exist 
> and could be resurrected.
> 
> I personally also think that people should be able to scratch their own 
> itch row cache wise so removing it entirely just because it isn’t 
> commonly used isn’t the right move unless the feature is very far out of 
> scope for Cassandra.
> 
> Auto enabling/disabling the cache is a can of worms that could result in 
> performance and reliability inconsistency as the DB enables/disables the 
> cache based on heuristics when you don’t want it to. It being off by 
> default seems good enough to me.
> 
> RE forking, we could create a GitHub org for OHC and then add people to 
> it. There are some examples of dependencies that haven’t been contributed 
> to the project that live outside like CCM and JAMM.
> 
> Ariel
> 
> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
> 
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
> 
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
> it off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
> 
> I would suggest having this as the middle ground.
> 
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
> 
> 
> 
> 
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in 
> a later release
> 
> 
> 
> 
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
> 
> Yes it works in some very narrow situations, but those situations often 
> change over time and again just bites the user.  Without the row-cache I 
> believe users would quickly find other, more suitable and lasting, 
> solutions.
> 
> 
>> 
> 


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Ariel Weisberg
Hi,

I did get one response from Robert indicating that he didn’t want to do the 
work to contribute it.

I offered to do the work and asked for permission to contribute it and no 
response. Followed up later with a ping and also no response.

Ariel

On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote:
>> I have reached out to the original maintainer about it and it seems like if 
>> we want to keep using it we will need to start releasing it under a new 
>> package from a different repo.
> 
>> the current maintainer is not interested in donating it to the ASF
> Is that the case Ariel or could you just not reach Robert?
> 
> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>>> from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>> 
>> From the original email bringing OHC in tree is not an option because the 
>> current maintainer is not interested in donating it to the ASF.  Thus the 
>> option 1 of some set of people forking it to their own github org and 
>> maintaining a version outside of the ASF C* project.
>> 
>> -Jeremiah
>> 
>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>>> Ariel,
>>> thank you for bringing this topic to the ML.
>>> 
>>> I may be missing something, so correct me if I'm wrong somewhere in
>>> the management of the Cassandra ecosystem.  As I see it, the problem
>>> right now is that if we fork the ohc and put it under its own root,
>>> the use of that row cache is still not well tested (the same as it is
>>> now). I am particularly emphasising the dependency management side, as
>>> any version change/upgrade in Cassandra and, as a result of that
>>> change a new set of libraries in the classpath should be tested
>>> against this integration.
>>> 
>>> So, unless it is being widely used by someone else outside of the
>>> community (which it doesn't seem to be), from a maintenance and
>>> integration testing perspective I think it would be better to keep the
>>> ohc in-tree, so we will be aware of any issues immediately after the
>>> full CI run.
>>> 
>>> I'm also +1 for not deprecating it, even if it is used in narrow
>>> cases, while the cost of maintaining its source code remains quite low
>>> and it brings some benefits.
>>> 
>>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
 
 Hi,
 
 To add some additional context.
 
 The row cache is disabled by default and it is already pluggable, but 
 there isn’t a Caffeine implementation present. I think one used to exist 
 and could be resurrected.
 
 I personally also think that people should be able to scratch their own 
 itch row cache wise so removing it entirely just because it isn’t commonly 
 used isn’t the right move unless the feature is very far out of scope for 
 Cassandra.
 
 Auto enabling/disabling the cache is a can of worms that could result in 
 performance and reliability inconsistency as the DB enables/disables the 
 cache based on heuristics when you don’t want it to. It being off by 
 default seems good enough to me.
 
 RE forking, we could create a GitHub org for OHC and then add people to 
 it. There are some examples of dependencies that haven’t been contributed 
 to the project that live outside like CCM and JAMM.
 
 Ariel
 
 On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
 
 I would avoid taking away a feature even if it works in narrow set of 
 use-cases. I would instead suggest -
 
 1. Leave it disabled by default.
 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
 it off. Cassandra should ideally detect this and do it automatically.
 3. Move to Caffeine instead of OHC.
 
 I would suggest having this as the middle ground.
 
 On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
 
 
 
 
 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in 
 a later release
 
 
 
 
 I'm for deprecating and removing it.
 It constantly trips users up and just causes pain.
 
 Yes it works in some very narrow situations, but those situations often 
 change over time and again just bites the user.  Without the row-cache I 
 believe users would quickly find other, more suitable and lasting, 
 solutions.
 
 
> 


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Josh McKenzie
> I have reached out to the original maintainer about it and it seems like if 
> we want to keep using it we will need to start releasing it under a new 
> package from a different repo.

> the current maintainer is not interested in donating it to the ASF
Is that the case Ariel or could you just not reach Robert?

On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote:
>> from a maintenance and
>> integration testing perspective I think it would be better to keep the
>> ohc in-tree, so we will be aware of any issues immediately after the
>> full CI run.
> 
> From the original email bringing OHC in tree is not an option because the 
> current maintainer is not interested in donating it to the ASF.  Thus the 
> option 1 of some set of people forking it to their own github org and 
> maintaining a version outside of the ASF C* project.
> 
> -Jeremiah
> 
> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:
>> Ariel,
>> thank you for bringing this topic to the ML.
>> 
>> I may be missing something, so correct me if I'm wrong somewhere in
>> the management of the Cassandra ecosystem.  As I see it, the problem
>> right now is that if we fork the ohc and put it under its own root,
>> the use of that row cache is still not well tested (the same as it is
>> now). I am particularly emphasising the dependency management side, as
>> any version change/upgrade in Cassandra and, as a result of that
>> change a new set of libraries in the classpath should be tested
>> against this integration.
>> 
>> So, unless it is being widely used by someone else outside of the
>> community (which it doesn't seem to be), from a maintenance and
>> integration testing perspective I think it would be better to keep the
>> ohc in-tree, so we will be aware of any issues immediately after the
>> full CI run.
>> 
>> I'm also +1 for not deprecating it, even if it is used in narrow
>> cases, while the cost of maintaining its source code remains quite low
>> and it brings some benefits.
>> 
>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>>> 
>>> Hi,
>>> 
>>> To add some additional context.
>>> 
>>> The row cache is disabled by default and it is already pluggable, but there 
>>> isn’t a Caffeine implementation present. I think one used to exist and 
>>> could be resurrected.
>>> 
>>> I personally also think that people should be able to scratch their own 
>>> itch row cache wise so removing it entirely just because it isn’t commonly 
>>> used isn’t the right move unless the feature is very far out of scope for 
>>> Cassandra.
>>> 
>>> Auto enabling/disabling the cache is a can of worms that could result in 
>>> performance and reliability inconsistency as the DB enables/disables the 
>>> cache based on heuristics when you don’t want it to. It being off by 
>>> default seems good enough to me.
>>> 
>>> RE forking, we could create a GitHub org for OHC and then add people to it. 
>>> There are some examples of dependencies that haven’t been contributed to 
>>> the project that live outside like CCM and JAMM.
>>> 
>>> Ariel
>>> 
>>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>>> 
>>> I would avoid taking away a feature even if it works in narrow set of 
>>> use-cases. I would instead suggest -
>>> 
>>> 1. Leave it disabled by default.
>>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn 
>>> it off. Cassandra should ideally detect this and do it automatically.
>>> 3. Move to Caffeine instead of OHC.
>>> 
>>> I would suggest having this as the middle ground.
>>> 
>>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>>> 
>>> 
>>> 
>>> 
>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>>> later release
>>> 
>>> 
>>> 
>>> 
>>> I'm for deprecating and removing it.
>>> It constantly trips users up and just causes pain.
>>> 
>>> Yes it works in some very narrow situations, but those situations often 
>>> change over time and again just bites the user.  Without the row-cache I 
>>> believe users would quickly find other, more suitable and lasting, 
>>> solutions.
>>> 
>>> 


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Jeremiah Jordan
>
> from a maintenance and
> integration testing perspective I think it would be better to keep the
> ohc in-tree, so we will be aware of any issues immediately after the
> full CI run.


>From the original email bringing OHC in tree is not an option because the
current maintainer is not interested in donating it to the ASF.  Thus the
option 1 of some set of people forking it to their own github org and
maintaining a version outside of the ASF C* project.

-Jeremiah

On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:

> Ariel,
> thank you for bringing this topic to the ML.
>
> I may be missing something, so correct me if I'm wrong somewhere in
> the management of the Cassandra ecosystem.  As I see it, the problem
> right now is that if we fork the ohc and put it under its own root,
> the use of that row cache is still not well tested (the same as it is
> now). I am particularly emphasising the dependency management side, as
> any version change/upgrade in Cassandra and, as a result of that
> change a new set of libraries in the classpath should be tested
> against this integration.
>
> So, unless it is being widely used by someone else outside of the
> community (which it doesn't seem to be), from a maintenance and
> integration testing perspective I think it would be better to keep the
> ohc in-tree, so we will be aware of any issues immediately after the
> full CI run.
>
> I'm also +1 for not deprecating it, even if it is used in narrow
> cases, while the cost of maintaining its source code remains quite low
> and it brings some benefits.
>
> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>
>
> Hi,
>
>
> To add some additional context.
>
>
> The row cache is disabled by default and it is already pluggable, but
> there isn’t a Caffeine implementation present. I think one used to exist
> and could be resurrected.
>
>
> I personally also think that people should be able to scratch their own
> itch row cache wise so removing it entirely just because it isn’t commonly
> used isn’t the right move unless the feature is very far out of scope for
> Cassandra.
>
>
> Auto enabling/disabling the cache is a can of worms that could result in
> performance and reliability inconsistency as the DB enables/disables the
> cache based on heuristics when you don’t want it to. It being off by
> default seems good enough to me.
>
>
> RE forking, we could create a GitHub org for OHC and then add people to
> it. There are some examples of dependencies that haven’t been contributed
> to the project that live outside like CCM and JAMM.
>
>
> Ariel
>
>
> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>
>
> I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
>
> 1. Leave it disabled by default.
>
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
>
> 3. Move to Caffeine instead of OHC.
>
>
> I would suggest having this as the middle ground.
>
>
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>
>
>
>
>
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in
> a later release
>
>
>
>
>
> I'm for deprecating and removing it.
>
> It constantly trips users up and just causes pain.
>
>
> Yes it works in some very narrow situations, but those situations often
> change over time and again just bites the user.  Without the row-cache I
> believe users would quickly find other, more suitable and lasting,
> solutions.
>
>
>
>


Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Maxim Muzafarov
Ariel,
thank you for bringing this topic to the ML.

I may be missing something, so correct me if I'm wrong somewhere in
the management of the Cassandra ecosystem.  As I see it, the problem
right now is that if we fork the ohc and put it under its own root,
the use of that row cache is still not well tested (the same as it is
now). I am particularly emphasising the dependency management side, as
any version change/upgrade in Cassandra and, as a result of that
change a new set of libraries in the classpath should be tested
against this integration.

So, unless it is being widely used by someone else outside of the
community (which it doesn't seem to be), from a maintenance and
integration testing perspective I think it would be better to keep the
ohc in-tree, so we will be aware of any issues immediately after the
full CI run.

I'm also +1 for not deprecating it, even if it is used in narrow
cases, while the cost of maintaining its source code remains quite low
and it brings some benefits.

On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>
> Hi,
>
> To add some additional context.
>
> The row cache is disabled by default and it is already pluggable, but there 
> isn’t a Caffeine implementation present. I think one used to exist and could 
> be resurrected.
>
> I personally also think that people should be able to scratch their own itch 
> row cache wise so removing it entirely just because it isn’t commonly used 
> isn’t the right move unless the feature is very far out of scope for 
> Cassandra.
>
> Auto enabling/disabling the cache is a can of worms that could result in 
> performance and reliability inconsistency as the DB enables/disables the 
> cache based on heuristics when you don’t want it to. It being off by default 
> seems good enough to me.
>
> RE forking, we could create a GitHub org for OHC and then add people to it. 
> There are some examples of dependencies that haven’t been contributed to the 
> project that live outside like CCM and JAMM.
>
> Ariel
>
> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
> off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>
>
>
>
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
> later release
>
>
>
>
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
>
> Yes it works in some very narrow situations, but those situations often 
> change over time and again just bites the user.  Without the row-cache I 
> believe users would quickly find other, more suitable and lasting, solutions.
>
>


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi,

To add some additional context.

The row cache is disabled by default and it is already pluggable, but there 
isn’t a Caffeine implementation present. I think one used to exist and could be 
resurrected.

I personally also think that people should be able to scratch their own itch 
row cache wise so removing it entirely just because it isn’t commonly used 
isn’t the right move unless the feature is very far out of scope for Cassandra.

Auto enabling/disabling the cache is a can of worms that could result in 
performance and reliability inconsistency as the DB enables/disables the cache 
based on heuristics when you don’t want it to. It being off by default seems 
good enough to me.

RE forking, we could create a GitHub org for OHC and then add people to it. 
There are some examples of dependencies that haven’t been contributed to the 
project that live outside like CCM and JAMM.

Ariel

On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
> I would avoid taking away a feature even if it works in narrow set of 
> use-cases. I would instead suggest -
> 
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
> off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
> 
> I would suggest having this as the middle ground.
> 
>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>> 
>>   
>>   
>>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>>> later release
>> 
>> 
>> 
>> I'm for deprecating and removing it.
>> It constantly trips users up and just causes pain.
>> 
>> Yes it works in some very narrow situations, but those situations often 
>> change over time and again just bites the user.  Without the row-cache I 
>> believe users would quickly find other, more suitable and lasting, solutions.


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jon Haddad
I think we should probably figure out how much value it actually provides
by getting some benchmarks around a few use cases along with some
profiling.  tlp-stress has a --rowcache flag that I added a while back to
be able to do this exact test.  I was looking for a use case to profile and
write up so this is actually kind of perfect for me.  I can take a look in
January when I'm back from the holidays.

Jon

On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:

>
>
>
> I would avoid taking away a feature even if it works in narrow set of
>> use-cases. I would instead suggest -
>>
>> 1. Leave it disabled by default.
>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
>> it off. Cassandra should ideally detect this and do it automatically.
>> 3. Move to Caffeine instead of OHC.
>>
>> I would suggest having this as the middle ground.
>>
>
>
>
> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to
> warn, hard value when to disable.
>


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Mick Semb Wever
I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>



Yes, I'm ok with this. (2) can also be a guardrail: soft value when to
warn, hard value when to disable.


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 5:35 PM, Paulo Motta  wrote:
> 
> This could be a potential hook for out-of-process caching.
> 
> Would something like this be valuable/feasible?

It is certainly feasible. I am not sure about its value.

Dinesh

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Paulo Motta
I like Dinesh's middle ground proposal, since this feature has valid uses.

I'm not familiar with the row caching module, but would it make sense to
take this opportunity to expose this feature as an optional Row Caching
Module, disabled by default with an optional on-heap Caffeine
implementation?

The API would look something like:

RowCachingAPI {
- onRowUpdated(RowKey, Mutation) -> cache row fragment
- onRowDeleted(RowKey) -> evict cached row fragment
- onPartitionDeleted(PartitionKey) -> evict cached partition fragment
- Optional getRow(RowKey) -> return cached row fragment
- Optional> getPartition(PartitionKey, resultSize) ->
return cached partition fragment
}

This could be a potential hook for out-of-process caching.

Would something like this be valuable/feasible?

On Thu, Dec 14, 2023 at 8:09 PM Dinesh Joshi  wrote:

> I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>
>
>
>>
>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in
>> a later release
>>
>
>
>
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
>
> Yes it works in some very narrow situations, but those situations often
> change over time and again just bites the user.  Without the row-cache I
> believe users would quickly find other, more suitable and lasting,
> solutions.
>
>
>


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
I would avoid taking away a feature even if it works in narrow set of 
use-cases. I would instead suggest -

1. Leave it disabled by default.
2. Detect when Row Cache has a low hit rate and warn the operator to turn it 
off. Cassandra should ideally detect this and do it automatically.
3. Move to Caffeine instead of OHC.

I would suggest having this as the middle ground.

> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
> 
>   
>   
>> 
>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>> later release
> 
> 
> 
> I'm for deprecating and removing it.
> It constantly trips users up and just causes pain.
> 
> Yes it works in some very narrow situations, but those situations often 
> change over time and again just bites the user.  Without the row-cache I 
> believe users would quickly find other, more suitable and lasting, solutions.



Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Mick Semb Wever
>
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in
> a later release
>



I'm for deprecating and removing it.
It constantly trips users up and just causes pain.

Yes it works in some very narrow situations, but those situations often
change over time and again just bites the user.  Without the row-cache I
believe users would quickly find other, more suitable and lasting,
solutions.


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jeff Jirsa



> On Dec 14, 2023, at 1:51 PM, Dinesh Joshi  wrote:
> 
> 
>> 
>> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg  wrote:
>> 
>> 1. Fork OHC and start publishing under a new package name and continue to 
>> use it
> 
> Who would fork it? Where would you fork it? My first instinct is that this 
> would not be viable path forward.
> 
>> 2. Replace OHC with a different cache implementation like Caffeine which 
>> would move it on heap
> 
> Doesn’t seem optimal but given the advent of newer garbage collectors, we 
> might be able to run Cassandra with larger heap sizes and moving this to heap 
> may be a non-issue. Someone needs to try it out and measure  the performance 
> impact with Zgc or Shenandoah.
> 
>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>> later release
> 
> In my experience, Row cache has historically helped in narrow workloads where 
> you have really hot rows but in other workloads it can hurt performance. So 
> keeping it around may be fine as long as people can disable it.

Especially well with tiny partitions . Once you start slicing / paging the 
benefit usually disappears 


> 
> Moving it on-heap using Caffeine maybe the easiest option here.

That’s what I’d do.


> 
> 
> Dinesh


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Dinesh Joshi
> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg  wrote:
> 
> 1. Fork OHC and start publishing under a new package name and continue to use 
> it

Who would fork it? Where would you fork it? My first instinct is that this 
would not be viable path forward.

> 2. Replace OHC with a different cache implementation like Caffeine which 
> would move it on heap

Doesn’t seem optimal but given the advent of newer garbage collectors, we might 
be able to run Cassandra with larger heap sizes and moving this to heap may be 
a non-issue. Someone needs to try it out and measure  the performance impact 
with Zgc or Shenandoah.

> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
> later release

In my experience, Row cache has historically helped in narrow workloads where 
you have really hot rows but in other workloads it can hurt performance. So 
keeping it around may be fine as long as people can disable it.

Moving it on-heap using Caffeine maybe the easiest option here.


Dinesh

Future direction for the row cache and OHC implementation

2023-12-14 Thread Ariel Weisberg
Hi,

Now seems like a good time to discuss the future direction of the row cache and 
its only implementation OHC (https://github.com/snazy/ohc).

OHC is currently unmaintained and we don’t have the ability to release maven 
artifacts for it or commit to the original repo. I have reached out to the 
original maintainer about it and it seems like if we want to keep using it we 
will need to start releasing it under a new package from a different repo.

I see four directions we could pursue.

1. Fork OHC and start publishing under a new package name and continue to use it
2. Replace OHC with a different cache implementation like Caffeine which would 
move it on heap
3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
later release
4. Do work to make a row cache not necessary and deprecate it later (or maybe 
now)

I would like to find out what people know about row cache usage in the wild so 
we can use that to inform the future direction as well as the general thinking 
about what we should do with it going forward.

Thanks,
Ariel