MaxGekk opened a new pull request #30032:
URL: https://github.com/apache/spark/pull/30032


   ### What changes were proposed in this pull request?
   In the PR, I propose to restrict the partial result feature only by root 
JSON objects. JSON datasource as well as `from_json()` will return `null` for 
malformed nested JSON objects.
   
   ### Why are the changes needed?
   1. To not raise exception to users in the PERMISSIVE mode
   2. To fix a regression and to have the same behavior as Spark 2.4.x has
   3. Current implementation of partial result is supposed to work only for 
root (top-level) JSON objects, and not tested for bad nested complex JSON 
fields.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Before the changes, the code below:
   ```scala
       val pokerhand_raw = Seq("""[{"cards": [11], "playerId": 
583651}]""").toDF("events")
       val event = new StructType().add("playerId", LongType).add("cards", 
ArrayType(new StructType().add("id", LongType).add("rank", StringType)))
       val pokerhand_events = pokerhand_raw.select(from_json($"events", 
ArrayType(event)).as("event"))
       pokerhand_events.show
   ```
   throws the exception even in the default **PERMISSIVE** mode:
   ```java
   java.lang.ClassCastException: java.lang.Long cannot be cast to 
org.apache.spark.sql.catalyst.util.ArrayData
     at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:48)
     at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:48)
     at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:195)
   ```
   
   After the changes:
   ```
   +-----+
   |event|
   +-----+
   | null|
   +-----+
   ```
   
   ### How was this patch tested?
   Added a test to `JsonFunctionsSuite`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to