[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841823#comment-17841823 ] dongwoo.kim commented on FLINK-34470: - Hi [~m.orazow], I’ve just made a PR and would appreciate your review I also left a comment about the metric issue in the discussion area and would appreciate any feedback on that. Thanks Best, Dongwoo > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: kafka-3.1.0 >Reporter: dongwoo.kim >Priority: Major > Labels: pull-request-available > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's > request timeouts after hanging. We can always reproduce this unexpected > behavior by following below steps. > This is related to this > [issue|https://issues.apache.org/jira/browse/FLINK-33484] too. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as getting count of the produced messages > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these control messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code, split is finished only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > {code:java} > if (consumer.position(tp) >= stoppingOffset) { > recordsBySplits.setPartitionStoppingOffset(tp, > stoppingOffset); > finishSplitAtRecord( > tp, > stoppingOffset, > lastRecord.offset(), > finishedPartitions, > recordsBySplits); > } > {code} > Replacing if condition to *consumer.position(tp) >= stoppingOffset* in > [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137] > can solve the problem. > *consumer.position(tp)* gets next record's offset if it exist and the last > record's offset if the next record doesn't exist. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. > I would be happy to implement about this fix if we can reach on agreement. > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837027#comment-17837027 ] dongwoo.kim commented on FLINK-34470: - Hello [~m.orazow], I'm glad that you're interested in collaborating, I’ll send a pr soon and keep you updated. Thanks for reaching out. Best regards, Dongwoo > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.17.1 >Reporter: dongwoo.kim >Priority: Major > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's > request timeouts after hanging. We can always reproduce this unexpected > behavior by following below steps. > This is related to this > [issue|https://issues.apache.org/jira/browse/FLINK-33484] too. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as getting count of the produced messages > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these control messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code, split is finished only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > {code:java} > if (consumer.position(tp) >= stoppingOffset) { > recordsBySplits.setPartitionStoppingOffset(tp, > stoppingOffset); > finishSplitAtRecord( > tp, > stoppingOffset, > lastRecord.offset(), > finishedPartitions, > recordsBySplits); > } > {code} > Replacing if condition to *consumer.position(tp) >= stoppingOffset* in > [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137] > can solve the problem. > *consumer.position(tp)* gets next record's offset if it exist and the last > record's offset if the next record doesn't exist. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. > I would be happy to implement about this fix if we can reach on agreement. > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835321#comment-17835321 ] Muhammet Orazov commented on FLINK-34470: - Hey [~dongwoo.kim] , Would you like to send a PR for it? It would be good to fix this for all transactional issues (including FLINK-33484), a good solution that works both for transactional, non-transactional and bounded mode cases. I would be happy to collaborate, please let me know. Best, Muhammet > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.17.1 >Reporter: dongwoo.kim >Priority: Major > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's > request timeouts after hanging. We can always reproduce this unexpected > behavior by following below steps. > This is related to this > [issue|https://issues.apache.org/jira/browse/FLINK-33484] too. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as getting count of the produced messages > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these control messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code, split is finished only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > {code:java} > if (consumer.position(tp) >= stoppingOffset) { > recordsBySplits.setPartitionStoppingOffset(tp, > stoppingOffset); > finishSplitAtRecord( > tp, > stoppingOffset, > lastRecord.offset(), > finishedPartitions, > recordsBySplits); > } > {code} > Replacing if condition to *consumer.position(tp) >= stoppingOffset* in > [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137] > can solve the problem. > *consumer.position(tp)* gets next record's offset if it exist and the last > record's offset if the next record doesn't exist. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. > I would be happy to implement about this fix if we can reach on agreement. > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830552#comment-17830552 ] Robert Metzger commented on FLINK-34470: [~renqs] can you take a look at this? > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.17.1 >Reporter: dongwoo.kim >Priority: Major > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql's > request timeouts after hanging. We can always reproduce this unexpected > behavior by following below steps. > This is related to this > [issue|https://issues.apache.org/jira/browse/FLINK-33484] too. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as getting count of the produced messages > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these control messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code, split is finished only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > {code:java} > if (consumer.position(tp) >= stoppingOffset) { > recordsBySplits.setPartitionStoppingOffset(tp, > stoppingOffset); > finishSplitAtRecord( > tp, > stoppingOffset, > lastRecord.offset(), > finishedPartitions, > recordsBySplits); > } > {code} > Replacing if condition to *consumer.position(tp) >= stoppingOffset* in > [here|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137] > can solve the problem. > *consumer.position(tp)* gets next record's offset if it exist and the last > record's offset if the next record doesn't exist. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. > I would be happy to implement about this fix if we can reach on agreement. > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818500#comment-17818500 ] dongwoo.kim commented on FLINK-34470: - [~martijnvisser] I have used latest version(flink-sql-connector-kafka-3.1.0-1.18.jar) and verified that issue still exists. This [line|https://github.com/apache/flink-connector-kafka/blob/15f2662eccf461d9d539ed87a78c9851cd17fa43/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/reader/KafkaPartitionSplitReader.java#L137] seems to be causing the issue > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.17.1 >Reporter: dongwoo.kim >Priority: Major > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql request > timeouts after hanging. We can always reproduce this unexpected behavior by > following below steps. > This is related to this > [issue|https://issues.apache.org/jira/browse/FLINK-33484] too. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as getting count of the produced messages > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these control messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code it finishes split only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > Adding *consumer.position(tp) >= stoppingOffset* condition to the if > statement. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. > I would be happy to implement about this fix it we can reach on agreement. > Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34470) Transactional message + Table api kafka source with 'latest-offset' scan bound mode causes indefinitely hanging
[ https://issues.apache.org/jira/browse/FLINK-34470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818490#comment-17818490 ] Martijn Visser commented on FLINK-34470: [~dongwoo.kim] Can you please verify this with the latest version of the Flink Kafka connector? > Transactional message + Table api kafka source with 'latest-offset' scan > bound mode causes indefinitely hanging > --- > > Key: FLINK-34470 > URL: https://issues.apache.org/jira/browse/FLINK-34470 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.17.1 >Reporter: dongwoo.kim >Priority: Major > > h2. Summary > Hi we have faced issue with transactional message and table api kafka source. > If we configure *'scan.bounded.mode'* to *'latest-offset'* flink sql request > timeouts after hanging. We can always reproduce this unexpected behavior by > following below steps. > h2. How to reproduce > 1. Deploy transactional producer and stop after producing certain amount of > messages > 2. Configure *'scan.bounded.mode'* to *'latest-offset'* and submit simple > query such as count(*) > 3. Flink sql job gets stucked and timeouts. > h2. Cause > Transaction producer always produces [control > records|https://kafka.apache.org/documentation/#controlbatch] at the end of > the transaction. And these controll messages are not polled by > {*}consumer.poll(){*}. (It is filtered internally). In > *KafkaPartitionSplitReader* code it finishes split only when > *lastRecord.offset() >= stoppingOffset - 1* condition is met. This might work > well with non transactional messages or streaming environment but in some > batch use cases it causes unexpected hanging. > h2. Proposed solution > Adding *consumer.position(tp) >= stoppingOffset* condition to the if > statement. > By this KafkaPartitionSplitReader is available to finish the split even when > the stopping offset is configured to control record's offset. -- This message was sent by Atlassian Jira (v8.20.10#820010)