[ https://issues.apache.org/jira/browse/KAFKA-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503911#comment-17503911 ]
Haruki Okada edited comment on KAFKA-10690 at 3/10/22, 12:14 AM: ----------------------------------------------------------------- Thanks for the comment. [~showuon] > Are you sure this issue is due to the `in-sync` replica fetch? Yeah, as long as replica fetch is `out-of-sync`, it doesn't block produce-request so the issue happens only on `in-sync` replica when `in-sync` replica fetching and `out-of-sync` replica fetching are done in same replica fetcher thread on follower side. > Could you have a PoC to add an additional thread pool for lagging replica to > confirm this solution? Haven't tried, as we wanted to confirm if anyone encounter similar issue or not (and if anyone addressed it in some way) first. But let us consider! [~junrao] > Have you tried enabling replication throttling? Yeah, we use replication throttling, and we suppose disk's performance itself is stable even on lagging-replica fetch. We use HDD, so reading the data takes few~tens of milliseconds per IO even it's stable. So if lagging replica fetch (likely not in page cache so causes disk reads) and in-sync replica fetch are done in same replica fetcher thread, in-sync one greatly affected by due to lagging one. was (Author: ocadaruma): Thanks for the comment. [~showuon] > Are you sure this issue is due to the `in-sync` replica fetch? Yeah, as long as replica fetch is `out-of-sync`, it doesn't block produce-request so the issue happens only on `in-sync` replica when `in-sync` replica fetching and `out-of-sync` replica fetching are done in same replica fetcher thread on follower side. > Could you have a PoC to add an additional thread pool for lagging replica to > confirm this solution? Haven't tried, as we wanted to confirm if anyone encounter similar issue or not (and if anyone addressed it in some way) first. But let us consider! [~junrao] > Have you tried enabling replication throttling? Yeah, we use replication throttling, and we suppose disk's performance itself is stable even on lagging-replica fetch. We use HDD, so reading the data takes few~tens of milliseconds per IO even it's stable. So if lagging replica fetch (likely not in page cache so causes disk reads) and in-sync replica fetch are done in same replica fetcher thread (i.e. in same Fetch request), in-sync one greatly affected by due to lagging one. > Produce-response delay caused by lagging replica fetch which affects in-sync > one > -------------------------------------------------------------------------------- > > Key: KAFKA-10690 > URL: https://issues.apache.org/jira/browse/KAFKA-10690 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 2.4.1 > Reporter: Haruki Okada > Priority: Major > Attachments: image-2020-11-06-11-15-21-781.png, > image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png > > > h2. Our environment > * Kafka version: 2.4.1 > h2. Phenomenon > * Produce response time 99th (remote scope) degrades to 500ms, which is 20 > times worse than usual > ** Meanwhile, the cluster was running replica reassignment to service-in new > machine to recover replicas which held by failed (Hardware issue) broker > machine > !image-2020-11-06-11-15-21-781.png|width=292,height=166! > h2. Analysis > Let's say > * broker-X: The broker we observed produce latency degradation > * broker-Y: The broker under servicing-in > broker-Y was catching up replicas of partitions: > * partition-A: has relatively small log size > * partition-B: has large log size > (actually, broker-Y was catching-up many other partitions. I noted only two > partitions here to make explanation simple) > broker-X was the leader for both partition-A and partition-B. > We found that both partition-A and partition-B are assigned to same > ReplicaFetcherThread of broker-Y, and produce latency started to degrade > right after broker-Y finished catching up partition-A. > !image-2020-11-06-11-17-09-910.png|width=476,height=174! > Besides, we observed disk reads on broker-X during service-in. (This is > natural since old segments are likely not in page cache) > !image-2020-11-06-11-15-38-390.png|width=292,height=193! > So we suspected that: > * In-sync replica fetch (partition-A) was involved by lagging replica fetch > (partition-B), which should be slow because it causes actual disk reads > ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next > fetch request can't be sent until one fetch request completes > ** => Causes in-sync replica fetch for partitions assigned to same replica > fetcher thread to delay > ** => Causes remote scope produce latency degradation > h2. Possible fix > We think this issue can be addressed by designating part of > ReplicaFetcherThread (or creating another thread pool) for lagging replica > catching-up, but not so sure this is the appropriate way. > Please give your opinions about this issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)