[PR] [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink [spark]

via GitHub Fri, 24 Jan 2025 10:51:13 -0800


vrozov opened a new pull request, #49654:
URL: https://github.com/apache/spark/pull/49654


   ### What changes were proposed in this pull request?
   1. Ensure that if relative path is used in `DataStreamWriter`, the path 
resolution is done on the Spark Driver and is not deferred to Spark Executor.
   2. Construct fully qualified path in `DataSource` similar to how it is done 
for `DataFrameWriter` before it is passed to `FileStreamSink`.
   3. Add a check to `FileStreamSink` that asserts that `path` is an absolute 
path.
   
   https://lists.apache.org/thread/ffzwn1y2fgyjw0j09cv4np9z00wymxwv
   
   
   ### Why are the changes needed?
   To properly support relative paths in structured streaming. The use case 
mostly applies to single node local Spark cluster.
   
   ### Does this PR introduce _any_ user-facing change?
   The change is only applicable to the use case when relative path is used in 
`DataStreamWriter`, resulting in data being output to correct directory. No 
changes are expected for absolute path (the most common production case).
   
   ### How was this patch tested?
   Added new test case to `FileStreamSinkSuite`.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-50854][SS] Make path fully qualified before passing it to FileStreamSink [spark]

Reply via email to