Thanks for taking the time to Respond, Jeff. I was kind expecting this reply the minute I included version 2.2.8. We are in the process of upgrading to 5.
As for the bootstrap automation we use. It has been in effect for more than 10 years and we replaced 100's of nodes without ever having any issues including this one. This automation we put in place based on the available documentation for Apache and Datastax cassandra. We have also had it assessed several times over the years by external consultants. Thanks for clarifying the getendpoints. This is why we paired it with the SELECT statement validation test as well to verify data. On Fri, Oct 10, 2025 at 1:54 PM Jeff Jirsa <[email protected]> wrote: > Also: nodetool getendpoints just hashes the key you provide against the > cluster topology / schema definition, which tells you which nodes WOULD own > the data if it exists. It does NOT guarantee that it exists. > > > > On 2025/10/10 17:48:32 Jeff Jirsa wrote: > > You're using a 9 year old release. There have been literally hundreds of > correctness fixes over those 9 years. You need to upgrade. > > > > The rest of your answers inline. > > > > > > > > On 2025/10/10 12:56:58 FMH wrote: > > > Few times a week, our developers report that Cassandra retrieves are > > > coming back with zero rows. No error messages. > > > > > > Using the same item ID's, a CQLSH SELECT statement returns a single > row as > > > expected. Furthermore, the NODETOOL GETENDPOINTS returns three IP's as > we > > > expect. > > > > > > This confirms these ItemID's do exist in Cassandra, it is just the Java > > > clients are not retrieving it. > > > > > > We noticed this issue to present itself more when nodes are replaced > in the > > > cluster as a result of EC2 node deprecation. > > > > Are you using EBS or ephemeral disk? Don't use ephemeral disk unless you > are much better at running cassandra and know how to replace a node without > data loss (which you do not seem to know how to do). > > > > > > > > > > Once the developers restarted the Java client apps, it was now able to > > > retrieve these ItemID's. > > > > That sounds weird. It may be that they read repaired or normal-repaired, > or it may be that the java apps were pointing to the wrong thing/cluster. > > > > > > > > 1- Is this what is called the 'empty' read' behavior? > > > 2- Is this caused by clients topology metadata getting out of sync > with the > > > cluster? > > > > Could be cluster scaling unsafely due to ec2 events. > > Could be low consistency level > > Could be any number of hundreds of topology bugs fixed since 2016. > > > > If it's a client bug, I assume it's an old client bug I've never seen > before. Well functioning cassandra clients shouldn't care about the > topology, the coordinating server will forward the request anyway. > > > > > 3- How can this be detected? Should we have client drivers return > 'metadata > > > = cluster.metadata' and compare it to 'nodetool gossipinfo'? > > > > Upgrade your cluster. > > Use EBS so when nodes change, they don't change data ownership. > > > > > 4- Other than restarting the clients, is there a way to have client > apps to > > > force to refresh their ring metadata? > > > > > > The client apps are using 'com.datastax.oss:java-driver-core:4.13.0' > > > driver. > > > > > > Google returns little information about this and GenAI's chat model > even > > > though useful, they tend to hallucinate with confidence often. > > > > > > Thanks > > > > > > ---------------------------------------- > > > Thank you > > > > > > -- ---------------------------------------- Thank you
