[PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SPARK-45511][SS] State Data Source - Reader [spark]

via GitHub Tue, 17 Oct 2023 23:07:47 -0700


HeartSaVioR opened a new pull request, #43425:
URL: https://github.com/apache/spark/pull/43425


   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce a baseline implementation of state processor - 
reader.
   
   State processor is a new data source which enables reading and writing the 
state in the existing checkpoint with the batch query. Since we implement the 
feature as data source, we are leveraging the UX for DataFrame API which most 
users are already familiar with.
   
   Functionalities of the baseline implementation are following:
   
   * Specify a state store instance via store name (default: DEFAULT)
   * Specify a stateful operator via operator ID (default: 0)
   * Specify a batch ID (default: last committed)
   * Specify the source option joinSide to construct input rows in the state 
store for stream-stream join
     * It is still possible that users can read a specific state store instance 
from 4 instances in stream-stream join, which would be used mostly for 
debugging Spark itself
     * When this is enabled, the data source hides the internal column from the 
output.
   * Specify a metadata column (_partition_id)so that users can indicate the 
partition ID for the state row.
   
   ### Why are the changes needed?
   
   Please refer to the SPIP doc for rationale: 
https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, we are adding a new data source.
   
   ### How was this patch tested?
   
   New test suite.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SPARK-45511][SS] State Data Source - Reader [spark]

Reply via email to