[PR] [SPARK-45794] [SS] Introduce state metadata source to query the streaming state metadata information [spark]

via GitHub Sat, 04 Nov 2023 22:21:39 -0700


chaoqin-li1123 opened a new pull request, #43660:
URL: https://github.com/apache/spark/pull/43660


   ### What changes were proposed in this pull request?
   Introduce a new data source so that user can query the metadata of each 
state store of a streaming query, the schema of the result will be
   operatorId INT | operatorName STRING | stateStoreName STRING | numPartitions 
INT | numColsPrefixKey INT | minBatchId  LONG | minBatchId LONG
   To use this source, specify the source format and checkpoint path and load 
the dataframe
   df = spark.read.format(“state-metadata”).load(“/checkpointPath”)
   
   ### Why are the changes needed?
   To improve debugability. Also facilitate the query of state store data 
source introduced in SPARK-45511 by displaying the operator id, batch id and 
state store name.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, this is a new source exposed to user.
   
   ### How was this patch tested?
   Add test to verify the output of state metadata
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-45794] [SS] Introduce state metadata source to query the streaming state metadata information [spark]

Reply via email to