Re: Understanding multi region read query and latency

Stéphane Alleaume Sun, 07 Aug 2022 12:44:45 -0700

Thanks a lot Scott, i didn't knew this fact.

Kind regards
Stéphane


Le dim. 7 août 2022, 19:31, C. Scott Andreas <sc...@paradoxica.net> a
écrit :

> > but still as I understand the documentation the read repair should not
> be in the blocking path of a query ?
>
> Read repair is in the blocking read path for the query, yep. At quorum
> consistency levels, the read repair must complete before returning a result
> to the client to ensure the data returned would be visible on subsequent
> reads that address the remainder of the quorum.
>
> If you enable tracing - either for a single CQL statement that is expected
> to be slow, or probabilistic from the server side to catch a slow query in
> the act - that will help identify what’s happening.
>
> - Scott
>
> On Aug 7, 2022, at 10:25 AM, Raphael Mazelier <r...@futomaki.net> wrote:
>
> 
>
> Nope. And what really puzzle me is in the trace we really show the
> difference between queries. The fast queries only request read from one
> replicas, while slow queries request from multiple replicas (and not only
> local to the dc).
> On 07/08/2022 14:02, Stéphane Alleaume wrote:
>
> Hi
>
> Is there some GC which could affect coordinarir node ?
>
> Kind regards
> Stéphane
>
> Le dim. 7 août 2022, 13:41, Raphael Mazelier <r...@futomaki.net> a écrit :
>
>> Thanks for the answer but I was well aware of this. I use localOne as
>> consistency level.
>>
>> My client connect to a local seeds, then choose a local coordinator (as
>> far I can understand the trace log).
>>
>> Then for a batch of request I got approximately 98% of request treated in
>> 2/3ms in local DC with one read request, and 2% treated by many nodes
>> (according to the trace) and then way longer (250ms).
>>
>> ?
>> On 06/08/2022 14:30, Bowen Song via user wrote:
>>
>> See the diagram below. Your problem almost certainly arises from step 4,
>> in which an incorrect consistency level set by the client caused the
>> coordinator node to send the READ command to nodes in other DCs.
>>
>> The load balancing policy only affects step 2 and 3, not step 1 or 4.
>>
>> You should change the consistency level to LOCAL_ONE/LOCAL_QUORUM/etc. to
>> fix the problem.
>>
>> On 05/08/2022 22:54, Bowen Song wrote:
>>
>> The  DCAwareRoundRobinPolicy/TokenAwareHostPolicy controlls which
>> Cassandra coordinator node the client sends queries to, not the nodes it
>> connects to, nor the nodes that performs the actual read.
>>
>> A client sends a CQL read query to a coordinator node, and the
>> coordinator node parses the CQL query, and send READ requests to other
>> nodes in the cluster based on the consistency level.
>>
>> Have you checked the consistency level of the session (and the query if
>> applicable)? Is it prefixed with "LOCAL_"? If not, the coordinator will
>> send the READ requests to non-local DCs.
>>
>>
>> On 05/08/2022 19:40, Raphael Mazelier wrote:
>>
>>
>> Hi Cassandra Users,
>>
>> I'm relatively new to Cassandra and first I have to say I'm really
>> impressed by the technology.
>>
>> Good design and a lot of stuff to understand the underlying (the Oreilly
>> book help a lot as well as thelastpickle blog post).
>>
>> I have an muli-datacenter c* cluster (US, Europe, Singapore) with eight
>> node on each (two seeds on each region), two racks on Eu, Singapore, 3 on
>> US. Everything deployed in AWS.
>>
>> We have a keyspace configured with network topology and two replicas on
>> every region like this: {'class': 'NetworkTopologyStrategy',
>> 'ap-southeast-1': '2', 'eu-west-1': '2', 'us-east-1': '2'}
>>
>>
>> Investigating some performance issue I noticed strange things in my
>> experiment:
>>
>> What we expect is very slow latency 3/5ms max for this specific select
>> query. So we want every read to be local the each datacenter.
>>
>> We configure DCAwareRoundRobinPolicy(local_dc=DC) in python, and the same
>> in Go gocql.TokenAwareHostPolicy(gocql.DCAwareRoundRobinPolicy("DC"))
>>
>> Testing a bit with two short program (I can provide them) in go and
>> python I notice very strange result. Basically I do the same query over and
>> over with a very limited dataset of id.
>>
>> The first result were surprising cause the very first query were always
>> more than 250ms and after with stressing c* (playing with sleep between
>> query) I can achieve a good ratio of query at 3/4 ms (what I expected).
>>
>> My guess was that long query were somewhat executed not locally (or at
>> least imply multi datacenter queries) and short one no.
>>
>> Activating tracing in my program (like enalbing trace in cqlsh) kindla
>> confirm my suspicion.
>>
>> (I will provide trace in attachment).
>>
>> My question is why sometime C* try to read not localy? how we can disable
>> it? what is the criteria for this?
>>
>> (btw I'm very not fan of this multi region design for theses very
>> specific kind of issues...)
>>
>> Also side question: why C* is so slow at connection? it's like it's
>> trying to reach every nodes in each DC? (we only provide locals seeds
>> however). Sometimes it take more than 20s...
>>
>> Any help appreciated.
>>
>> Best,
>>
>> --
>>
>> Raphael Mazelier
>>
>>

Re: Understanding multi region read query and latency

Reply via email to