Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
showuon commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-2148598853 @kamalcph , could we move the dynamic config change into another PR? I have some comments to it, but that is separate from the original changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
showuon commented on code in PR #14778: URL: https://github.com/apache/kafka/pull/14778#discussion_r1625827494 ## clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java: ## @@ -198,7 +198,7 @@ public class ConsumerConfig extends AbstractConfig { * fetch.max.wait.ms */ public static final String FETCH_MAX_WAIT_MS_CONFIG = "fetch.max.wait.ms"; -private static final String FETCH_MAX_WAIT_MS_DOC = "The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes."; +private static final String FETCH_MAX_WAIT_MS_DOC = "The maximum amount of time the server will block before answering the fetch request when it is reading near to the tail of the partition (high-watermark) and there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes."; Review Comment: Should we mention this config is only used for local log fetch? Also, we should mention this in the end: `To tune the remote fetch maximum time wait, please refer to "remote.fetch.max.wait.ms" config.`. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
showuon commented on code in PR #14778: URL: https://github.com/apache/kafka/pull/14778#discussion_r1625827494 ## clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java: ## @@ -198,7 +198,7 @@ public class ConsumerConfig extends AbstractConfig { * fetch.max.wait.ms */ public static final String FETCH_MAX_WAIT_MS_CONFIG = "fetch.max.wait.ms"; -private static final String FETCH_MAX_WAIT_MS_DOC = "The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes."; +private static final String FETCH_MAX_WAIT_MS_DOC = "The maximum amount of time the server will block before answering the fetch request when it is reading near to the tail of the partition (high-watermark) and there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes."; Review Comment: Should we mention this config is only used for local log fetch? Also, we should mention this in the end: `To tune the remote fetch maximum time wait, please refer to "remote.fetch.max.wait.ms" config.`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
kamalcph commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-2147168785 @showuon @clolov @jeqo The patch is ready for review. PTAL. Will open a separate PR for dynamic `remote.fetch.max.wait.ms` config and RemoteLogReader FetchRateAndTimeMs metrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
showuon commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-2051465918 Correction: For this: > some users might feel surprised when their fetch doesn't respond in fetch.max.wait.ms time. This is wrong. Even if the remote reading is not completed, yet, the fetch request will still return in `fetch.max.wait.ms`. It's just an empty response. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
github-actions[bot] commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-1953432771 This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch) If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
showuon commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-1820588080 I understand the problem you're trying to solve, but using the server default request timeout doesn't make sense to me. It will break the contract of fetch protocol that `fetch.max.wait.ms` will not be exceeded if no sufficient data in "local" log. I understand the remote read is some kind of grey area about if "data is existed or not", but we have to admit, some users might feel surprised when their fetch doesn't respond in `fetch.max.wait.ms` time. Ideally, we should introduce another config for this remote read waiting purpose, instead of re-using request timeout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
kamalcph commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-1818291232 > Could you please help me understand how this change works with fetch.max.wait.ms from a user perspective i.e. what happens when we are retrieving data from both local & remote in a single fetch call? `fetch.max.wait.ms` timeout is applicable only when there is no enough data (`fetch.min.bytes`) to respond back to the client. This is a special case where we are reading the data from both local and remote, the FETCH request has to wait for the tail latency which is a combined latency of reading from both local and remote storage. Note that we always read from only one remote partition up-to `max.partition.fetch.bytes` even-though there is available bandwidth in the FETCH response (`fetch.max.bytes`) and the client rotates the partition order in the next FETCH request so that next partitions are served. > Also, wouldn't this change user clients? Asking because prior to this change users were expecting a guaranteed response within fetch.max.wait.ms = 500ms but now they might not receive a response until 40s request.timeout.ms. If the user has configured their application timeouts to according to fetch.max.wait.ms, this change will break my application. `fetch.max.wait.ms` doesn't guarantee a response within this timeout. The client expires the request only when it exceeds the `request.timeout.ms` of 30 seconds (default). The time taken to serve the FETCH request can be higher than the `fetch.max.wait.ms` due to slow hard-disk, sector errors in disk and so on. The [FetchRequest.json](https://sourcegraph.com/github.com/apache/kafka/-/blob/clients/src/main/resources/common/message/FetchRequest.json) doesn't expose the client configured request timeout, so we are using the default server request timeout of 30 seconds. Otherwise, we can introduce one more config `fetch.remote.max.wait.ms` to define the delay timeout for DelayedRemoteFetch requests. We need to decide whether to keep this config in the client/server since the server operator may need to tune this config if the remote storage degrades and latency to serve the FETCH requests is high. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
divijvaidya commented on PR #14778: URL: https://github.com/apache/kafka/pull/14778#issuecomment-1817593722 Could you please help me understand how this change works with `fetch.max.wait.ms` from a user perspective i.e. what happens when we are retrieving data from both local & remote in a single fetch call? Also, wouldn't this change user clients? Asking because prior to this change users were expecting a guaranteed response within `fetch.max.wait.ms` = 500ms but now they might not receive a response until 40s `request.timeout.ms`. If the user has configured their application timeouts to according to `fetch.max.wait.ms`, this change will break my application. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
kamalcph commented on code in PR #14778: URL: https://github.com/apache/kafka/pull/14778#discussion_r1395458516 ## core/src/main/scala/kafka/server/ReplicaManager.scala: ## @@ -1338,7 +1338,8 @@ class ReplicaManager(val config: KafkaConfig, return Some(createLogReadResult(e)) } -val remoteFetch = new DelayedRemoteFetch(remoteFetchTask, remoteFetchResult, remoteFetchInfo, +val timeout = config.requestTimeoutMs.toLong +val remoteFetch = new DelayedRemoteFetch(remoteFetchTask, remoteFetchResult, remoteFetchInfo, timeout, Review Comment: Passing the full request timeout to the `DelayedRemoteFetch`, we can reduce the time spent by the handler from the `requestTimeout` to reach up-to here to expire the delayedRemoteFetch requests immediately if required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-15776: Use the FETCH request timeout as the delay timeout for DelayedRemoteFetch [kafka]
kamalcph commented on code in PR #14778: URL: https://github.com/apache/kafka/pull/14778#discussion_r1395458516 ## core/src/main/scala/kafka/server/ReplicaManager.scala: ## @@ -1338,7 +1338,8 @@ class ReplicaManager(val config: KafkaConfig, return Some(createLogReadResult(e)) } -val remoteFetch = new DelayedRemoteFetch(remoteFetchTask, remoteFetchResult, remoteFetchInfo, +val timeout = config.requestTimeoutMs.toLong +val remoteFetch = new DelayedRemoteFetch(remoteFetchTask, remoteFetchResult, remoteFetchInfo, timeout, Review Comment: Passing the full request timeout to the `DelayedRemoteFetch`, we can reduce the time spent by the handler from the `requestTimeout` to reach up-to here to expire the delayedRemoteFetch requests immediately if required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org