Re: [PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]
yanspirit commented on PR #91: URL: https://github.com/apache/flink-connector-kafka/pull/91#issuecomment-2051339512 > Is it possible to add a test for this situation? This test is a bit tricky as it requires simulating broker crash. I have tested this locally, and it ignores partitions where leader=-1. Once the leader recovers, these partitions will be detected by the discovery-partition and added to process. Should I add a configuration switch for this optimization? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]
boring-cyborg[bot] commented on PR #91: URL: https://github.com/apache/flink-connector-kafka/pull/91#issuecomment-2041355545 Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]
yanspirit opened a new pull request, #91: URL: https://github.com/apache/flink-connector-kafka/pull/91 when partition leader invalid(leader=-1), the flink streaming job using KafkaSource can't restart or start a new instance with a new groupid, it will stuck and got following exception: "org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms expired before the position for partition aaa-1 could be determined" when leader=-1, kafka api like KafkaConsumer.position() will block until either the position could be determined or an unrecoverable error is encountered infact, leader=-1 not easy to avoid, even replica=3, three disk offline together will trigger the problem, especially when the cluster size is relatively large. it rely on kafka administrator to fix in time, but it take risk when in kafka cluster peak period. This can be addressed by using the invalid leader filter and discovery partition interval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org