sandip-db commented on code in PR #44994:
URL: https://github.com/apache/spark/pull/44994#discussion_r1475592244
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala:
##########
@@ -145,7 +145,8 @@ class StaxXmlParser(
def doParseColumn(xml: String,
parseMode: ParseMode,
xsdSchema: Option[Schema]): Option[InternalRow] = {
- val xmlRecord = UTF8String.fromString(xml)
+ // limit the size of the XML records to add in bad record exception
Review Comment:
Adding an option to fine tune corrupt record handling may be an overkill.
`UTF8String.fromString` throws `NegativeArraySizeException` for strings of
size 1B.
I can increase the limit close to that (say 512 M) to avoid this exception.
WDYT?
Alternatively, I can remove the limit and just do `lazy val` that would
throw the above exception when a large bad record is encountered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]