Re: [PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]

2024-04-12 Thread via GitHub


yanspirit commented on PR #91:
URL: 
https://github.com/apache/flink-connector-kafka/pull/91#issuecomment-2051339512

   > Is it possible to add a test for this situation?
   This test is a bit tricky as it requires simulating broker crash. I have 
tested this locally, and it ignores partitions where leader=-1. Once the leader 
recovers, these partitions will be detected by the discovery-partition and 
added to process. Should I add a configuration switch for this optimization?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]

2024-04-07 Thread via GitHub


boring-cyborg[bot] commented on PR #91:
URL: 
https://github.com/apache/flink-connector-kafka/pull/91#issuecomment-2041355545

   Thanks for opening this pull request! Please check out our contributing 
guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [FLINK-34995] flink kafka connector source stuck when partition leade… [flink-connector-kafka]

2024-04-07 Thread via GitHub


yanspirit opened a new pull request, #91:
URL: https://github.com/apache/flink-connector-kafka/pull/91

   when partition leader invalid(leader=-1),  the flink streaming job using 
KafkaSource can't restart or start a new instance with a new groupid,  it will 
stuck and got following exception:
   
   "org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms expired 
before the position for partition aaa-1 could be determined"
   
   when leader=-1,  kafka api like KafkaConsumer.position() will block until 
either the position could be determined or an unrecoverable error is 
encountered 
   
   infact,  leader=-1 not easy to avoid,  even replica=3, three disk offline 
together will trigger the problem, especially when the cluster size is 
relatively large.    it rely on kafka administrator to fix in time,  but it 
take risk when in kafka cluster peak period.
   
   This can be addressed by using the invalid leader filter and discovery 
partition interval.
    
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org