Boyang Jerry Peng created SPARK-39591:
-----------------------------------------

             Summary: Offset Management Improvements in Structured Streaming
                 Key: SPARK-39591
                 URL: https://issues.apache.org/jira/browse/SPARK-39591
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 3.3.0
            Reporter: Boyang Jerry Peng


Currently in Structured Streaming, at the beginning of every micro-batch the 
offset to process up to for the current batch is persisted to durable storage.  
At the end of every micro-batch, a marker to indicate the completion of this 
current micro-batch is persisted to durable storage. For pipelines such as one 
that read from Kafka and write to Kafka, end-to-end exactly once is not support 
and latency is sensitive, we can allow users to configure offset commits to be 
written asynchronously thus this commit operation will not contribute to the 
batch duration and effectively lowering the overall latency of the pipeline.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to