Haruki Okada created KAFKA-10690:
------------------------------------

             Summary: Produce-response delay caused by lagging replica fetch 
which blocks in-sync one
                 Key: KAFKA-10690
                 URL: https://issues.apache.org/jira/browse/KAFKA-10690
             Project: Kafka
          Issue Type: Improvement
          Components: core
    Affects Versions: 2.4.1
            Reporter: Haruki Okada
         Attachments: image-2020-11-06-11-15-21-781.png, 
image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png

h2. Our environment
 * Kafka version: 2.4.1

h2. Phenomenon
 * Produce response time 99th (remote scope) degrades to 500ms, which is 20 
times worse than usual
 ** Meanwhile, the cluster was running replica reassignment to service-in new 
machine to recover replicas which held by failed (Hardware issue) broker machine

!image-2020-11-06-11-15-21-781.png|width=292,height=166!
h2. Analysis

Let's say
 * broker-X: The broker we observed produce latency degradation
 * broker-Y: The broker under servicing-in

broker-Y was catching up replicas of partitions:
 * partition-A: has relatively small log size
 * partition-B: has large log size

(actually, broker-Y was catching-up many other partitions. I noted only two 
partitions here to make explanation simple)

broker-X was the leader for both partition-A and partition-B.

We found that both partition-A and partition-B are assigned to same 
ReplicaFetcherThread of broker-Y, and produce latency started to degrade right 
after broker-Y finished catching up partition-A.

!image-2020-11-06-11-17-09-910.png|width=476,height=174!

Besides, we observed disk reads on broker-X during service-in. (This is natural 
since old segments are likely not in page cache)

!image-2020-11-06-11-15-38-390.png|width=292,height=193!

So we suspected that:
 * In-sync replica fetch (partition-A) was involved by lagging replica fetch 
(partition-B), which should be slow because it causes actual disk reads
 ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next 
fetch request can't be sent until one fetch request completes
 ** => Causes in-sync replica fetch for partitions assigned to same replica 
fetcher thread to delay
 ** => Causes remote scope produce latency degradation

h2. Possible fix

We think this issue can be addressed by designating part of 
ReplicaFetcherThread (or creating another thread pool) for lagging replica 
catching-up, but not so sure this is the appropriate way.

Please give your opinions about this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to