| > but still as I understand the documentation the read repair should not be in the blocking path of a query ?
Read repair is in the blocking read path for the query, yep. At quorum consistency levels, the read repair must complete before returning a result to the client to ensure the data returned would be visible on subsequent reads that address the remainder of the quorum.
If you enable tracing - either for a single CQL statement that is expected to be slow, or probabilistic from the server side to catch a slow query in the act - that will help identify what’s happening. - Scott
Nope. And what really puzzle me is in the
trace we really show the difference between queries. The fast
queries only request read from one replicas, while slow queries
request from multiple replicas (and not only local to the dc).
On 07/08/2022 14:02, Stéphane Alleaume
wrote:
Hi
Is there some GC which could affect coordinarir
node ?
Kind regards
Stéphane
Thanks for the answer but I was
well aware of this. I use localOne as consistency level.
My client connect to a local seeds, then choose a local
coordinator (as far I can understand the trace log).
Then for a batch of request I got approximately 98% of
request treated in 2/3ms in local DC with one read
request, and 2% treated by many nodes (according to the
trace) and then way longer (250ms).
?
On 06/08/2022 14:30, Bowen Song via user wrote:
See the diagram below. Your problem almost certainly
arises from step 4, in which an incorrect consistency
level set by the client caused the coordinator node to
send the READ command to nodes in other DCs.
The load balancing policy only affects step 2 and 3,
not step 1 or 4.
You should change the consistency level to
LOCAL_ONE/LOCAL_QUORUM/etc. to fix the problem.
![]()
On 05/08/2022 22:54, Bowen Song wrote:
The
DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls
which Cassandra coordinator node the client sends
queries to, not the nodes it connects to, nor the nodes
that performs the actual read.
A client sends a CQL read query to a coordinator node,
and the coordinator node parses the CQL query, and send
READ requests to other nodes in the cluster based on the
consistency level.
Have you checked the consistency level of the session
(and the query if applicable)? Is it prefixed with
"LOCAL_"? If not, the coordinator will send the READ
requests to non-local DCs.
On 05/08/2022 19:40, Raphael Mazelier wrote:
Hi Cassandra Users,
I'm relatively new to Cassandra and first I have to
say I'm really impressed by the technology.
Good design and a lot of stuff to understand the
underlying (the Oreilly book help a lot as well as
thelastpickle blog post).
I have an muli-datacenter c* cluster (US, Europe,
Singapore) with eight node on each (two seeds on each
region), two racks on Eu, Singapore, 3 on US.
Everything deployed in AWS.
We have a keyspace configured with network topology
and two replicas on every region like this: {'class':
'NetworkTopologyStrategy', 'ap-southeast-1': '2',
'eu-west-1': '2', 'us-east-1': '2'}
Investigating some performance issue I noticed strange
things in my experiment:
What we expect is very slow latency 3/5ms max for this
specific select query. So we want every read to be
local the each datacenter.
We configure DCAwareRoundRobinPolicy(local_dc=DC) in
python, and the same in Go
gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC"))
Testing a bit with two short program (I can provide
them) in go and python I notice very strange result.
Basically I do the same query over and over with a
very limited dataset of id.
The first result were surprising cause the very first
query were always more than 250ms and after with
stressing c* (playing with sleep between query) I can
achieve a good ratio of query at 3/4 ms (what I
expected).
My guess was that long query were somewhat executed
not locally (or at least imply multi datacenter
queries) and short one no.
Activating tracing in my program (like enalbing trace
in cqlsh) kindla confirm my suspicion.
(I will provide trace in attachment).
My question is why sometime C* try to read not localy?
how we can disable it? what is the criteria for this?
(btw I'm very not fan of this multi region design for
theses very specific kind of issues...)
Also side question: why C* is so slow at connection?
it's like it's trying to reach every nodes in each DC?
(we only provide locals seeds however). Sometimes it
take more than 20s...
Any help appreciated.
Best,
--
Raphael Mazelier
|