What consistency level are you using on reads and writes?

If either are less than LOCAL_QUORUM, this behavior is definitely expected.

If you ARE using quorum/local_quorum, and you can correlate these issues to 
when a node scales in / out, then it's probably consistency violation on 
bootstrap.  Unless you run repair before you re-bootstrap a node (assuming 
you're using ephemeral disks), the bootstrap process may choose a streaming 
source that is missing ~1 write, and then you end up with 2 replicas without 
the write, so reads dont see it until chance read repair or repair is run.



On 2025/10/10 18:30:41 FMH wrote:
> Thanks for taking the time to Respond, Jeff.
> 
> I was kind expecting this reply the minute I included version 2.2.8. We are
> in the process of upgrading to 5.
> 
> As for the bootstrap automation we use. It has been in effect for more than
> 10 years and we replaced 100's of nodes without ever having any issues
> including this one.
> 
> This automation we put in place based on the available documentation for
> Apache and Datastax cassandra. We have also had it assessed several times
> over the years by external consultants.
> 
> Thanks for clarifying the getendpoints. This is why we paired it with the
> SELECT statement validation test as well to verify data.
> 
> On Fri, Oct 10, 2025 at 1:54 PM Jeff Jirsa <[email protected]> wrote:
> 
> > Also: nodetool getendpoints just hashes the key you provide against the
> > cluster topology / schema definition, which tells you which nodes WOULD own
> > the data if it exists. It does NOT guarantee that it exists.
> >
> >
> >
> > On 2025/10/10 17:48:32 Jeff Jirsa wrote:
> > > You're using a 9 year old release. There have been literally hundreds of
> > correctness fixes over those 9 years. You need to upgrade.
> > >
> > > The rest of your answers inline.
> > >
> > >
> > >
> > > On 2025/10/10 12:56:58 FMH wrote:
> > > > Few times a week, our developers report that Cassandra retrieves are
> > > > coming back with zero rows. No error messages.
> > > >
> > > > Using the same item ID's, a CQLSH SELECT statement returns a single
> > row as
> > > > expected. Furthermore, the NODETOOL GETENDPOINTS returns three IP's as
> > we
> > > > expect.
> > > >
> > > > This confirms these ItemID's do exist in Cassandra, it is just the Java
> > > > clients are not retrieving it.
> > > >
> > > > We noticed this issue to present itself more when nodes are replaced
> > in the
> > > > cluster as a result of EC2 node deprecation.
> > >
> > > Are you using EBS or ephemeral disk? Don't use ephemeral disk unless you
> > are much better at running cassandra and know how to replace a node without
> > data loss (which you do not seem to know how to do).
> > >
> > >
> > > >
> > > > Once the developers restarted the Java client apps, it was now able to
> > > > retrieve these ItemID's.
> > >
> > > That sounds weird. It may be that they read repaired or normal-repaired,
> > or it may be that the java apps were pointing to the wrong thing/cluster.
> > >
> > > >
> > > > 1- Is this what is called the 'empty' read' behavior?
> > > > 2- Is this caused by clients topology metadata getting out of sync
> > with the
> > > > cluster?
> > >
> > > Could be cluster scaling unsafely due to ec2 events.
> > > Could be low consistency level
> > > Could be any number of hundreds of topology bugs fixed since 2016.
> > >
> > > If it's a client bug, I assume it's an old client bug I've never seen
> > before. Well functioning cassandra clients shouldn't care about the
> > topology, the coordinating server will forward the request anyway.
> > >
> > > > 3- How can this be detected? Should we have client drivers return
> > 'metadata
> > > > = cluster.metadata' and compare it to 'nodetool gossipinfo'?
> > >
> > > Upgrade your cluster.
> > > Use EBS so when nodes change, they don't change data ownership.
> > >
> > > > 4- Other than restarting the clients, is there a way to have client
> > apps to
> > > > force to refresh their ring metadata?
> > > >
> > > > The client apps are using 'com.datastax.oss:java-driver-core:4.13.0'
> > > > driver.
> > > >
> > > > Google returns little information about this and GenAI's chat model
> > even
> > > > though useful, they tend to hallucinate with confidence often.
> > > >
> > > > Thanks
> > > >
> > > > ----------------------------------------
> > > > Thank you
> > > >
> > >
> >
> 
> 
> -- 
> 
> ----------------------------------------
> Thank you
> 

Reply via email to