eric-maynard commented on code in PR #46408:
URL: https://github.com/apache/spark/pull/46408#discussion_r1591615927
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala:
##########
@@ -280,13 +280,32 @@ class JacksonParser(
case VALUE_STRING =>
UTF8String.fromString(parser.getText)
- case _ =>
+ case other =>
// Note that it always tries to convert the data as string without
the case of failure.
- val writer = new ByteArrayOutputStream()
- Utils.tryWithResource(factory.createGenerator(writer,
JsonEncoding.UTF8)) {
- generator => generator.copyCurrentStructure(parser)
+ val startLocation = parser.getTokenLocation
+ startLocation.contentReference().getRawContent match {
+ case byteArray: Array[Byte] =>
+ other match {
+ case START_OBJECT =>
+ parser.skipChildren()
+ case START_ARRAY =>
+ parser.skipChildren()
+ case _ =>
+ // Do nothing in this case; we've already read the token
+ }
+ val endLocation = parser.currentLocation.getByteOffset
+
+ UTF8String.fromBytes(
+ byteArray,
+ startLocation.getByteOffset.toInt,
+ endLocation.toInt - (startLocation.getByteOffset.toInt))
+ case _ =>
Review Comment:
It's not clear to me when or if this branch would occur, but I have
preserved the existing code here to cover cases where the input may not be
readily indexed by a byte offset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]