Re: cassandra 4.0 java 11 support

2021-07-27 Thread Erick Ramirez
There's been some discussion around removing the "experimental" tag for C* 4.0 + Java 11 so by all means, we encourage everyone to try it and report back to the community if you run into issues. Java 11 support was added 2 years ago so I think most of the issues have been ironed out. Now that 4.0

Re: cassandra 4.0 java 11 support

2021-07-27 Thread Bowen Song
Experimental means anything can happen - dragons, unicorns, ... On 27/07/2021 21:32, CPC wrote: Hi , At cassandra site https://cassandra.apache.org/doc/latest/cassandra/new/java11.html , it says java 11 support is

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
Okay. Sure. Thanks a lot for all the information. Really helped. :) On Tue, 27 Jul 2021 at 21:05, Bowen Song wrote: > Based on the information I know, I'd say that you don't have any specific > issue with the authentication related tables, but you do have a general > overloading problem during

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Bowen Song
Based on the information I know, I'd say that you don't have any specific issue with the authentication related tables, but you do have a general overloading problem during peak load. I think it's fairly likely that your 7 nodes cluster (6 nodes in one DC) is not able to keep up with the peak

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
Yes, the application in quite read heavy and the request pattern is bursty too. Hence that big a request failure in such less time. Also, nothing out of the ordinary in cfstats and proxyhistograms. But there are Native-Transport-Requests dropped messages (Almost similar stats on all the nodes) :

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Bowen Song
Wow, 15 seconds timeout? That's pretty long... You may want to check the nodetool tpstats and make sure the NTP thread pool isn't blocking things. 16k read requests dropped in 5 seconds, or over 3k requests per second on a single node, is a bit suspicious. Does your read requests tend to be

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
Yes, RF=6 for system auth. Sorry my bad. No, we are not using cassandra user for the application. We have a custom super user for our operational and administrative tasks and a separate role with needed perms for the application. > role | super | login | options >

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Bowen Song
Hello Chahat, You haven't replied to the first point, are you using the "cassandra" user? The schema and your description don't quite match. When you said: // /the system_auth  for 2 DCs : //*us-east*//with 6 nodes (and RF=3) and ... / I assume you meant to say 6 nodes and RF=6?

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
> > Also, It's interesting that you've set validity to over 3 days but you > update them every 6 hours. Is that intentional? We set that earlier when were in the process to add new roles (creating new roles for the new apps we setup) but we never changed after that and hence its been the same

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
Thanks for the prompt response. *Here is the system_schema.keyspaces entry:* system_auth | True | {'class': > 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '6', > 'us-east-backup': '1'} > census | True | {'class': >

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Bowen Song
Hello Chahat, First, can you please make sure the Cassandra user used by the application is not "cassandra"? Because the "cassandra" user uses QUORUM consistency level to read the auth tables. Then, can you please make sure the replication strategy is set correctly for the system_auth

Re: Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Erick Ramirez
Are you using the default `cassandra` superuser role? Because that would be expensive. Also confirm if you've set the replication for the `system_auth` keyspace to NTS because if you have multiple DCs, the request could be going to another DC. It's interesting that you've set validity to over 3

Permission/Role Cache causing timeouts in apps.

2021-07-27 Thread Chahat Bhatia
Hi Community, Context: We are running a cluster of 6 nodes in production with a RF=3 in AWS. We recently moved from physical servers to cloud by adding a new DC and then removing the old one. Everything is working fine in all the other applications except this one. *As we recently started