Nirakar created SPARK-40825:
-------------------------------

             Summary: Topic-partition in offset deleted before commiting
                 Key: SPARK-40825
                 URL: https://issues.apache.org/jira/browse/SPARK-40825
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.2.2
            Reporter: Nirakar


Structured streaming subscribed with "subscribePattern". I see that structured 
streaming maintains two folders inside checkpoint location, offsets and commits.

When a batch arrives the offset folder gets updated with "topic 
name:\{partition:offset}" for that batch. And when this offset is properly 
processed. The commits folder gets updated with the batch id that was just 
processed.

In our case data was sent to that topic, the offset folder was updated but the 
topic got deleted before the offset was processed and commit was updated and 
now we get the message,

org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired 
before the position for partition topicName-1 could be determined

Even when we restart the stream, it still tries to fetch the same 
topic-partition and throws this error.

Is there a way to handle this kind of situation?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to