[PR] [SDP] Validate streaming-ness of DFs returned by SDP table and standalone flow definitions [spark]

via GitHub Tue, 17 Jun 2025 15:37:27 -0700


AnishMahto opened a new pull request, #51208:
URL: https://github.com/apache/spark/pull/51208


   ### What changes were proposed in this pull request? 
   Validate that streaming flows are actually backed by streaming sources, and 
batch flows are actually backed by batch sources. Also improve SDP test 
harnesses to be explicit about whether a streaming table or materialized view 
is being created, to match the Python/SQL API.
   
   ### Why are the changes needed?
   This change helps prevent incorrect usage of streaming/batch flows, such as 
directly reading from a batch source from a streaming table's flow. In this 
case for example, the `STREAM` key word to mark a SQL batch source as streaming 
or `readStream` should be used in Python to stream read from an otherwise 
non-streaming file source.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No, as this impacts SDP which is not released in any Spark version yet.
   
   ### How was this patch tested?
   Existing suites and added tests to `ConnectInvalidPipelineSuite`
   
   ### Was this patch authored or co-authored using generative AI tooling? 
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SDP] Validate streaming-ness of DFs returned by SDP table and standalone flow definitions [spark]

Reply via email to