201573 opened a new pull request, #55660: URL: https://github.com/apache/spark/pull/55660
### What changes were proposed in this pull request? This PR allows os.PathLike path objects, such as pathlib.Path, to be passed to PySpark readwriter path APIs. The change normalizes path-like objects with os.fsdecode before sending paths to the JVM or Spark Connect plans. ### Why are the changes needed? Currently, several PySpark readwriter methods accept only str or list[str] paths. Python users commonly use pathlib.Path, and these objects should work for file-system backed data sources. Closes #55203. ### Does this PR introduce any user-facing change? Yes. Users can pass pathlib.Path / os.PathLike objects to supported readwriter APIs. ### How was this patch tested? - ./dev/lint-python --compile - git diff --check - Added PySpark readwriter tests for pathlib.Path - Added Spark Connect plan coverage for path-like path lists Full PySpark runtime tests were not run locally because this machine does not have a Java Runtime installed. ### Was this patch authored or co-authored using generative AI tooling? Yes. I used OpenAI Codex to help implement and test this change. I have reviewed the changes and take responsibility for them. This contribution is my original work and I license the work to the project under the project's open source license. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
