Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

via GitHub Mon, 06 May 2024 15:24:07 -0700


eric-maynard commented on code in PR #46408:
URL: https://github.com/apache/spark/pull/46408#discussion_r1591615927



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala:
##########
@@ -280,13 +280,32 @@ class JacksonParser(
         case VALUE_STRING =>
           UTF8String.fromString(parser.getText)
 
-        case _ =>
+        case other =>
           // Note that it always tries to convert the data as string without 
the case of failure.
-          val writer = new ByteArrayOutputStream()
-          Utils.tryWithResource(factory.createGenerator(writer, 
JsonEncoding.UTF8)) {
-            generator => generator.copyCurrentStructure(parser)
+          val startLocation = parser.getTokenLocation
+          startLocation.contentReference().getRawContent match {
+            case byteArray: Array[Byte] =>
+               other match {
+                case START_OBJECT =>
+                  parser.skipChildren()
+                case START_ARRAY =>
+                  parser.skipChildren()
+                case _ =>
+                   // Do nothing in this case; we've already read the token
+              }
+              val endLocation = parser.currentLocation.getByteOffset
+
+              UTF8String.fromBytes(
+                byteArray,
+                startLocation.getByteOffset.toInt,
+                endLocation.toInt - (startLocation.getByteOffset.toInt))
+            case _ =>

Review Comment:
   It's not clear to me when or if this branch would occur, but I have 
preserved the existing code here to cover cases where the input may not be 
readily indexed by a byte offset.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48148][CORE] JSON objects should not be modified when read as STRING [spark]

Reply via email to