sandip-db opened a new pull request, #41832: URL: https://github.com/apache/spark/pull/41832
### What changes were proposed in this pull request? XML is a widely used data format. An external spark-xml package (https://github.com/databricks/spark-xml) is available to read and write XML data in spark. Making spark-xml built-in will provide a better user experience for Spark SQL and structured streaming. The proposal is to inline code from spark-xml package. The PR has 4 main commits: i) The first commit is just a vanilla copy of spark-xml src files to spark/connector ii) The second commit fixes scala format issues and update/add ASF license to relevant files. iii) Add mvn dependencies, etc. iv) Use SharedSparkSession and testFiles to access resource files in the XML unit tests under all environment (sbt, mvn, IntelliJ). ### Why are the changes needed? Built-in support for XML data source would provide better user experience than having to import an external package. ### Does this PR introduce _any_ user-facing change? Yes, Add built-in support for XML data source. ### How was this patch tested? Tested the new unit-tests that came with the imported spark-xml package. Also ran ./dev/run-test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
