Re: Future direction for the row cache and OHC implementation
Hi, we once did some extensive performance testing on the row cache (motivated by some hardware accelerator we were hoping to introduce) but could only find improvements in highly contrived scenarios - has been a while since then so fresh eyes are good but I think we will still arrive at the conclusion to deprecate the row cache. Thanks, German From: Jon Haddad Sent: Monday, December 18, 2023 10:31 AM To: dev@cassandra.apache.org Subject: [EXTERNAL] Re: Future direction for the row cache and OHC implementation You don't often get email from j...@jonhaddad.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Sure, I’d love to work with you on this. — Jon Haddad Rustyrazorblade Consulting rustyrazorblade.com<http://rustyrazorblade.com/> On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg mailto:ar...@weisberg.ws>> wrote: Hi, Thanks for the generous offer. Before you do that can you give me a chance to add back support for Caffeine for the row cache so you can test the option of switching back to an on-heap row cache? Ariel On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote: I think we should probably figure out how much value it actually provides by getting some benchmarks around a few use cases along with some profiling. tlp-stress has a --rowcache flag that I added a while back to be able to do this exact test. I was looking for a use case to profile and write up so this is actually kind of perfect for me. I can take a look in January when I'm back from the holidays. Jon On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever mailto:m...@apache.org>> wrote: I would avoid taking away a feature even if it works in narrow set of use-cases. I would instead suggest - 1. Leave it disabled by default. 2. Detect when Row Cache has a low hit rate and warn the operator to turn it off. Cassandra should ideally detect this and do it automatically. 3. Move to Caffeine instead of OHC. I would suggest having this as the middle ground. Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, hard value when to disable.
Re: Future direction for the row cache and OHC implementation
Sure, I’d love to work with you on this. — Jon Haddad Rustyrazorblade Consulting rustyrazorblade.com On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg wrote: > Hi, > > Thanks for the generous offer. Before you do that can you give me a chance > to add back support for Caffeine for the row cache so you can test the > option of switching back to an on-heap row cache? > > Ariel > > On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote: > > I think we should probably figure out how much value it actually provides > by getting some benchmarks around a few use cases along with some > profiling. tlp-stress has a --rowcache flag that I added a while back to > be able to do this exact test. I was looking for a use case to profile and > write up so this is actually kind of perfect for me. I can take a look in > January when I'm back from the holidays. > > Jon > > On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever wrote: > > > > > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > > > > > Yes, I'm ok with this. (2) can also be a guardrail: soft value when to > warn, hard value when to disable. > > >
Re: Future direction for the row cache and OHC implementation
Hi, Thanks for the generous offer. Before you do that can you give me a chance to add back support for Caffeine for the row cache so you can test the option of switching back to an on-heap row cache? Ariel On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote: > I think we should probably figure out how much value it actually provides by > getting some benchmarks around a few use cases along with some profiling. > tlp-stress has a --rowcache flag that I added a while back to be able to do > this exact test. I was looking for a use case to profile and write up so > this is actually kind of perfect for me. I can take a look in January when > I'm back from the holidays. > > Jon > > On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever wrote: >> >> >> >>> I would avoid taking away a feature even if it works in narrow set of >>> use-cases. I would instead suggest - >>> >>> 1. Leave it disabled by default. >>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn >>> it off. Cassandra should ideally detect this and do it automatically. >>> 3. Move to Caffeine instead of OHC. >>> >>> I would suggest having this as the middle ground. >> >> >> >> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, >> hard value when to disable.
Re: Future direction for the row cache and OHC implementation
Gotcha; wasn't sure given the earlier phrasing. Makes sense. Dinesh's compromise position makes sense to me. On Fri, Dec 15, 2023, at 11:21 PM, Ariel Weisberg wrote: > Hi, > > I did get one response from Robert indicating that he didn’t want to do the > work to contribute it. > > I offered to do the work and asked for permission to contribute it and no > response. Followed up later with a ping and also no response. > > Ariel > > On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote: >>> I have reached out to the original maintainer about it and it seems like if >>> we want to keep using it we will need to start releasing it under a new >>> package from a different repo. >> >>> the current maintainer is not interested in donating it to the ASF >> Is that the case Ariel or could you just not reach Robert? >> >> On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote: from a maintenance and integration testing perspective I think it would be better to keep the ohc in-tree, so we will be aware of any issues immediately after the full CI run. >>> >>> From the original email bringing OHC in tree is not an option because the >>> current maintainer is not interested in donating it to the ASF. Thus the >>> option 1 of some set of people forking it to their own github org and >>> maintaining a version outside of the ASF C* project. >>> >>> -Jeremiah >>> >>> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov wrote: Ariel, thank you for bringing this topic to the ML. I may be missing something, so correct me if I'm wrong somewhere in the management of the Cassandra ecosystem. As I see it, the problem right now is that if we fork the ohc and put it under its own root, the use of that row cache is still not well tested (the same as it is now). I am particularly emphasising the dependency management side, as any version change/upgrade in Cassandra and, as a result of that change a new set of libraries in the classpath should be tested against this integration. So, unless it is being widely used by someone else outside of the community (which it doesn't seem to be), from a maintenance and integration testing perspective I think it would be better to keep the ohc in-tree, so we will be aware of any issues immediately after the full CI run. I'm also +1 for not deprecating it, even if it is used in narrow cases, while the cost of maintaining its source code remains quite low and it brings some benefits. On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg wrote: > > Hi, > > To add some additional context. > > The row cache is disabled by default and it is already pluggable, but > there isn’t a Caffeine implementation present. I think one used to exist > and could be resurrected. > > I personally also think that people should be able to scratch their own > itch row cache wise so removing it entirely just because it isn’t > commonly used isn’t the right move unless the feature is very far out of > scope for Cassandra. > > Auto enabling/disabling the cache is a can of worms that could result in > performance and reliability inconsistency as the DB enables/disables the > cache based on heuristics when you don’t want it to. It being off by > default seems good enough to me. > > RE forking, we could create a GitHub org for OHC and then add people to > it. There are some examples of dependencies that haven’t been contributed > to the project that live outside like CCM and JAMM. > > Ariel > > On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: > > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > > On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: > > > > > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in > a later release > > > > > I'm for deprecating and removing it. > It constantly trips users up and just causes pain. > > Yes it works in some very narrow situations, but those situations often > change over time and again just bites the user. Without the row-cache I > believe users would quickly find other, more suitable and lasting, > solutions. > > >> >
Re: Future direction for the row cache and OHC implementation
Hi, I did get one response from Robert indicating that he didn’t want to do the work to contribute it. I offered to do the work and asked for permission to contribute it and no response. Followed up later with a ping and also no response. Ariel On Fri, Dec 15, 2023, at 9:58 PM, Josh McKenzie wrote: >> I have reached out to the original maintainer about it and it seems like if >> we want to keep using it we will need to start releasing it under a new >> package from a different repo. > >> the current maintainer is not interested in donating it to the ASF > Is that the case Ariel or could you just not reach Robert? > > On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote: >>> from a maintenance and >>> integration testing perspective I think it would be better to keep the >>> ohc in-tree, so we will be aware of any issues immediately after the >>> full CI run. >> >> From the original email bringing OHC in tree is not an option because the >> current maintainer is not interested in donating it to the ASF. Thus the >> option 1 of some set of people forking it to their own github org and >> maintaining a version outside of the ASF C* project. >> >> -Jeremiah >> >> On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov wrote: >>> Ariel, >>> thank you for bringing this topic to the ML. >>> >>> I may be missing something, so correct me if I'm wrong somewhere in >>> the management of the Cassandra ecosystem. As I see it, the problem >>> right now is that if we fork the ohc and put it under its own root, >>> the use of that row cache is still not well tested (the same as it is >>> now). I am particularly emphasising the dependency management side, as >>> any version change/upgrade in Cassandra and, as a result of that >>> change a new set of libraries in the classpath should be tested >>> against this integration. >>> >>> So, unless it is being widely used by someone else outside of the >>> community (which it doesn't seem to be), from a maintenance and >>> integration testing perspective I think it would be better to keep the >>> ohc in-tree, so we will be aware of any issues immediately after the >>> full CI run. >>> >>> I'm also +1 for not deprecating it, even if it is used in narrow >>> cases, while the cost of maintaining its source code remains quite low >>> and it brings some benefits. >>> >>> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg wrote: Hi, To add some additional context. The row cache is disabled by default and it is already pluggable, but there isn’t a Caffeine implementation present. I think one used to exist and could be resurrected. I personally also think that people should be able to scratch their own itch row cache wise so removing it entirely just because it isn’t commonly used isn’t the right move unless the feature is very far out of scope for Cassandra. Auto enabling/disabling the cache is a can of worms that could result in performance and reliability inconsistency as the DB enables/disables the cache based on heuristics when you don’t want it to. It being off by default seems good enough to me. RE forking, we could create a GitHub org for OHC and then add people to it. There are some examples of dependencies that haven’t been contributed to the project that live outside like CCM and JAMM. Ariel On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: I would avoid taking away a feature even if it works in narrow set of use-cases. I would instead suggest - 1. Leave it disabled by default. 2. Detect when Row Cache has a low hit rate and warn the operator to turn it off. Cassandra should ideally detect this and do it automatically. 3. Move to Caffeine instead of OHC. I would suggest having this as the middle ground. On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a later release I'm for deprecating and removing it. It constantly trips users up and just causes pain. Yes it works in some very narrow situations, but those situations often change over time and again just bites the user. Without the row-cache I believe users would quickly find other, more suitable and lasting, solutions. >
Re: Future direction for the row cache and OHC implementation
> I have reached out to the original maintainer about it and it seems like if > we want to keep using it we will need to start releasing it under a new > package from a different repo. > the current maintainer is not interested in donating it to the ASF Is that the case Ariel or could you just not reach Robert? On Fri, Dec 15, 2023, at 11:55 AM, Jeremiah Jordan wrote: >> from a maintenance and >> integration testing perspective I think it would be better to keep the >> ohc in-tree, so we will be aware of any issues immediately after the >> full CI run. > > From the original email bringing OHC in tree is not an option because the > current maintainer is not interested in donating it to the ASF. Thus the > option 1 of some set of people forking it to their own github org and > maintaining a version outside of the ASF C* project. > > -Jeremiah > > On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov wrote: >> Ariel, >> thank you for bringing this topic to the ML. >> >> I may be missing something, so correct me if I'm wrong somewhere in >> the management of the Cassandra ecosystem. As I see it, the problem >> right now is that if we fork the ohc and put it under its own root, >> the use of that row cache is still not well tested (the same as it is >> now). I am particularly emphasising the dependency management side, as >> any version change/upgrade in Cassandra and, as a result of that >> change a new set of libraries in the classpath should be tested >> against this integration. >> >> So, unless it is being widely used by someone else outside of the >> community (which it doesn't seem to be), from a maintenance and >> integration testing perspective I think it would be better to keep the >> ohc in-tree, so we will be aware of any issues immediately after the >> full CI run. >> >> I'm also +1 for not deprecating it, even if it is used in narrow >> cases, while the cost of maintaining its source code remains quite low >> and it brings some benefits. >> >> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg wrote: >>> >>> Hi, >>> >>> To add some additional context. >>> >>> The row cache is disabled by default and it is already pluggable, but there >>> isn’t a Caffeine implementation present. I think one used to exist and >>> could be resurrected. >>> >>> I personally also think that people should be able to scratch their own >>> itch row cache wise so removing it entirely just because it isn’t commonly >>> used isn’t the right move unless the feature is very far out of scope for >>> Cassandra. >>> >>> Auto enabling/disabling the cache is a can of worms that could result in >>> performance and reliability inconsistency as the DB enables/disables the >>> cache based on heuristics when you don’t want it to. It being off by >>> default seems good enough to me. >>> >>> RE forking, we could create a GitHub org for OHC and then add people to it. >>> There are some examples of dependencies that haven’t been contributed to >>> the project that live outside like CCM and JAMM. >>> >>> Ariel >>> >>> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: >>> >>> I would avoid taking away a feature even if it works in narrow set of >>> use-cases. I would instead suggest - >>> >>> 1. Leave it disabled by default. >>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn >>> it off. Cassandra should ideally detect this and do it automatically. >>> 3. Move to Caffeine instead of OHC. >>> >>> I would suggest having this as the middle ground. >>> >>> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: >>> >>> >>> >>> >>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a >>> later release >>> >>> >>> >>> >>> I'm for deprecating and removing it. >>> It constantly trips users up and just causes pain. >>> >>> Yes it works in some very narrow situations, but those situations often >>> change over time and again just bites the user. Without the row-cache I >>> believe users would quickly find other, more suitable and lasting, >>> solutions. >>> >>>
Re: Future direction for the row cache and OHC implementation
> > from a maintenance and > integration testing perspective I think it would be better to keep the > ohc in-tree, so we will be aware of any issues immediately after the > full CI run. >From the original email bringing OHC in tree is not an option because the current maintainer is not interested in donating it to the ASF. Thus the option 1 of some set of people forking it to their own github org and maintaining a version outside of the ASF C* project. -Jeremiah On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov wrote: > Ariel, > thank you for bringing this topic to the ML. > > I may be missing something, so correct me if I'm wrong somewhere in > the management of the Cassandra ecosystem. As I see it, the problem > right now is that if we fork the ohc and put it under its own root, > the use of that row cache is still not well tested (the same as it is > now). I am particularly emphasising the dependency management side, as > any version change/upgrade in Cassandra and, as a result of that > change a new set of libraries in the classpath should be tested > against this integration. > > So, unless it is being widely used by someone else outside of the > community (which it doesn't seem to be), from a maintenance and > integration testing perspective I think it would be better to keep the > ohc in-tree, so we will be aware of any issues immediately after the > full CI run. > > I'm also +1 for not deprecating it, even if it is used in narrow > cases, while the cost of maintaining its source code remains quite low > and it brings some benefits. > > On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg wrote: > > > Hi, > > > To add some additional context. > > > The row cache is disabled by default and it is already pluggable, but > there isn’t a Caffeine implementation present. I think one used to exist > and could be resurrected. > > > I personally also think that people should be able to scratch their own > itch row cache wise so removing it entirely just because it isn’t commonly > used isn’t the right move unless the feature is very far out of scope for > Cassandra. > > > Auto enabling/disabling the cache is a can of worms that could result in > performance and reliability inconsistency as the DB enables/disables the > cache based on heuristics when you don’t want it to. It being off by > default seems good enough to me. > > > RE forking, we could create a GitHub org for OHC and then add people to > it. There are some examples of dependencies that haven’t been contributed > to the project that live outside like CCM and JAMM. > > > Ariel > > > On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: > > > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > > 1. Leave it disabled by default. > > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > > 3. Move to Caffeine instead of OHC. > > > I would suggest having this as the middle ground. > > > On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: > > > > > > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in > a later release > > > > > > I'm for deprecating and removing it. > > It constantly trips users up and just causes pain. > > > Yes it works in some very narrow situations, but those situations often > change over time and again just bites the user. Without the row-cache I > believe users would quickly find other, more suitable and lasting, > solutions. > > > >
Re: Future direction for the row cache and OHC implementation
Ariel, thank you for bringing this topic to the ML. I may be missing something, so correct me if I'm wrong somewhere in the management of the Cassandra ecosystem. As I see it, the problem right now is that if we fork the ohc and put it under its own root, the use of that row cache is still not well tested (the same as it is now). I am particularly emphasising the dependency management side, as any version change/upgrade in Cassandra and, as a result of that change a new set of libraries in the classpath should be tested against this integration. So, unless it is being widely used by someone else outside of the community (which it doesn't seem to be), from a maintenance and integration testing perspective I think it would be better to keep the ohc in-tree, so we will be aware of any issues immediately after the full CI run. I'm also +1 for not deprecating it, even if it is used in narrow cases, while the cost of maintaining its source code remains quite low and it brings some benefits. On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg wrote: > > Hi, > > To add some additional context. > > The row cache is disabled by default and it is already pluggable, but there > isn’t a Caffeine implementation present. I think one used to exist and could > be resurrected. > > I personally also think that people should be able to scratch their own itch > row cache wise so removing it entirely just because it isn’t commonly used > isn’t the right move unless the feature is very far out of scope for > Cassandra. > > Auto enabling/disabling the cache is a can of worms that could result in > performance and reliability inconsistency as the DB enables/disables the > cache based on heuristics when you don’t want it to. It being off by default > seems good enough to me. > > RE forking, we could create a GitHub org for OHC and then add people to it. > There are some examples of dependencies that haven’t been contributed to the > project that live outside like CCM and JAMM. > > Ariel > > On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: > > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn it > off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > > On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: > > > > > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a > later release > > > > > I'm for deprecating and removing it. > It constantly trips users up and just causes pain. > > Yes it works in some very narrow situations, but those situations often > change over time and again just bites the user. Without the row-cache I > believe users would quickly find other, more suitable and lasting, solutions. > >
Re: Future direction for the row cache and OHC implementation
Hi, To add some additional context. The row cache is disabled by default and it is already pluggable, but there isn’t a Caffeine implementation present. I think one used to exist and could be resurrected. I personally also think that people should be able to scratch their own itch row cache wise so removing it entirely just because it isn’t commonly used isn’t the right move unless the feature is very far out of scope for Cassandra. Auto enabling/disabling the cache is a can of worms that could result in performance and reliability inconsistency as the DB enables/disables the cache based on heuristics when you don’t want it to. It being off by default seems good enough to me. RE forking, we could create a GitHub org for OHC and then add people to it. There are some examples of dependencies that haven’t been contributed to the project that live outside like CCM and JAMM. Ariel On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote: > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn it > off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > >> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: >> >> >> >>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a >>> later release >> >> >> >> I'm for deprecating and removing it. >> It constantly trips users up and just causes pain. >> >> Yes it works in some very narrow situations, but those situations often >> change over time and again just bites the user. Without the row-cache I >> believe users would quickly find other, more suitable and lasting, solutions.
Re: Future direction for the row cache and OHC implementation
I think we should probably figure out how much value it actually provides by getting some benchmarks around a few use cases along with some profiling. tlp-stress has a --rowcache flag that I added a while back to be able to do this exact test. I was looking for a use case to profile and write up so this is actually kind of perfect for me. I can take a look in January when I'm back from the holidays. Jon On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever wrote: > > > > I would avoid taking away a feature even if it works in narrow set of >> use-cases. I would instead suggest - >> >> 1. Leave it disabled by default. >> 2. Detect when Row Cache has a low hit rate and warn the operator to turn >> it off. Cassandra should ideally detect this and do it automatically. >> 3. Move to Caffeine instead of OHC. >> >> I would suggest having this as the middle ground. >> > > > > Yes, I'm ok with this. (2) can also be a guardrail: soft value when to > warn, hard value when to disable. >
Re: Future direction for the row cache and OHC implementation
I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > Yes, I'm ok with this. (2) can also be a guardrail: soft value when to warn, hard value when to disable.
Re: Future direction for the row cache and OHC implementation
> On Dec 14, 2023, at 5:35 PM, Paulo Motta wrote: > > This could be a potential hook for out-of-process caching. > > Would something like this be valuable/feasible? It is certainly feasible. I am not sure about its value. Dinesh
Re: Future direction for the row cache and OHC implementation
I like Dinesh's middle ground proposal, since this feature has valid uses. I'm not familiar with the row caching module, but would it make sense to take this opportunity to expose this feature as an optional Row Caching Module, disabled by default with an optional on-heap Caffeine implementation? The API would look something like: RowCachingAPI { - onRowUpdated(RowKey, Mutation) -> cache row fragment - onRowDeleted(RowKey) -> evict cached row fragment - onPartitionDeleted(PartitionKey) -> evict cached partition fragment - Optional getRow(RowKey) -> return cached row fragment - Optional> getPartition(PartitionKey, resultSize) -> return cached partition fragment } This could be a potential hook for out-of-process caching. Would something like this be valuable/feasible? On Thu, Dec 14, 2023 at 8:09 PM Dinesh Joshi wrote: > I would avoid taking away a feature even if it works in narrow set of > use-cases. I would instead suggest - > > 1. Leave it disabled by default. > 2. Detect when Row Cache has a low hit rate and warn the operator to turn > it off. Cassandra should ideally detect this and do it automatically. > 3. Move to Caffeine instead of OHC. > > I would suggest having this as the middle ground. > > On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: > > > >> >> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in >> a later release >> > > > > I'm for deprecating and removing it. > It constantly trips users up and just causes pain. > > Yes it works in some very narrow situations, but those situations often > change over time and again just bites the user. Without the row-cache I > believe users would quickly find other, more suitable and lasting, > solutions. > > >
Re: Future direction for the row cache and OHC implementation
I would avoid taking away a feature even if it works in narrow set of use-cases. I would instead suggest - 1. Leave it disabled by default. 2. Detect when Row Cache has a low hit rate and warn the operator to turn it off. Cassandra should ideally detect this and do it automatically. 3. Move to Caffeine instead of OHC. I would suggest having this as the middle ground. > On Dec 14, 2023, at 4:41 PM, Mick Semb Wever wrote: > > > >> >> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a >> later release > > > > I'm for deprecating and removing it. > It constantly trips users up and just causes pain. > > Yes it works in some very narrow situations, but those situations often > change over time and again just bites the user. Without the row-cache I > believe users would quickly find other, more suitable and lasting, solutions.
Re: Future direction for the row cache and OHC implementation
> > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in > a later release > I'm for deprecating and removing it. It constantly trips users up and just causes pain. Yes it works in some very narrow situations, but those situations often change over time and again just bites the user. Without the row-cache I believe users would quickly find other, more suitable and lasting, solutions.
Re: Future direction for the row cache and OHC implementation
> On Dec 14, 2023, at 1:51 PM, Dinesh Joshi wrote: > > >> >> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg wrote: >> >> 1. Fork OHC and start publishing under a new package name and continue to >> use it > > Who would fork it? Where would you fork it? My first instinct is that this > would not be viable path forward. > >> 2. Replace OHC with a different cache implementation like Caffeine which >> would move it on heap > > Doesn’t seem optimal but given the advent of newer garbage collectors, we > might be able to run Cassandra with larger heap sizes and moving this to heap > may be a non-issue. Someone needs to try it out and measure the performance > impact with Zgc or Shenandoah. > >> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a >> later release > > In my experience, Row cache has historically helped in narrow workloads where > you have really hot rows but in other workloads it can hurt performance. So > keeping it around may be fine as long as people can disable it. Especially well with tiny partitions . Once you start slicing / paging the benefit usually disappears > > Moving it on-heap using Caffeine maybe the easiest option here. That’s what I’d do. > > > Dinesh
Re: Future direction for the row cache and OHC implementation
> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg wrote: > > 1. Fork OHC and start publishing under a new package name and continue to use > it Who would fork it? Where would you fork it? My first instinct is that this would not be viable path forward. > 2. Replace OHC with a different cache implementation like Caffeine which > would move it on heap Doesn’t seem optimal but given the advent of newer garbage collectors, we might be able to run Cassandra with larger heap sizes and moving this to heap may be a non-issue. Someone needs to try it out and measure the performance impact with Zgc or Shenandoah. > 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a > later release In my experience, Row cache has historically helped in narrow workloads where you have really hot rows but in other workloads it can hurt performance. So keeping it around may be fine as long as people can disable it. Moving it on-heap using Caffeine maybe the easiest option here. Dinesh
Future direction for the row cache and OHC implementation
Hi, Now seems like a good time to discuss the future direction of the row cache and its only implementation OHC (https://github.com/snazy/ohc). OHC is currently unmaintained and we don’t have the ability to release maven artifacts for it or commit to the original repo. I have reached out to the original maintainer about it and it seems like if we want to keep using it we will need to start releasing it under a new package from a different repo. I see four directions we could pursue. 1. Fork OHC and start publishing under a new package name and continue to use it 2. Replace OHC with a different cache implementation like Caffeine which would move it on heap 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a later release 4. Do work to make a row cache not necessary and deprecate it later (or maybe now) I would like to find out what people know about row cache usage in the wild so we can use that to inform the future direction as well as the general thinking about what we should do with it going forward. Thanks, Ariel