sandip-db opened a new pull request, #42462:
URL: https://github.com/apache/spark/pull/42462

   ### What changes were proposed in this pull request?
   This is the second PR related to the built-in XML data source implementation 
([jira](https://issues.apache.org/jira/browse/SPARK-44751)).
   The previous [PR](https://github.com/apache/spark/pull/41832) ported the 
spark-xml package.
   This PR addresses the following:
   - Implement FileFormat interface
   - Address the review comments in the previous [XML 
PR](https://github.com/apache/spark/pull/41832)
   - Moved from_xml and schema_of_xml to sql/functions
   - Moved ".xml" to DataFrameReader/DataFrameWriter
   - Removed old APIs like XmlRelation, XmlReader, etc.
   - StaxXmlParser changes:
      - Use FailureSafeParser
      - Convert 'Row' usage to 'InternalRow'
      - Convert String to UTF8String
      - Handle MapData and ArrayData for MapType and ArrayType respectively
      - Use TimestampFormatter to parse timestamp
      - Use DateFormatter to parse date
   - StaxXmlGenerator changes:
      - Convert 'Row' usage to 'InternalRow'
      - Handle UTF8String for StringType
      - Handle MapData and ArrayData for MapType and ArrayType respectively
      - Use TimestampFormatter to format timestamp
      - Use DateFormatter to format date
   - Update XML tests accordingly because of the above changes
   
   
   ### Why are the changes needed?
   These changes are required to bring XML data source capability at par with 
CSV and JSON and supports features like streaming, which requires FileFormat 
interface to be implemented.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, it adds support for XML data source.
   
   
   ### How was this patch tested?
   - Ran all the XML unit tests. 
   - Github Action
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to