Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22237#discussion_r223459186
--- Diff: docs/sql-programming-guide.md ---
@@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
# Migration Guide
+## Upgrading From Spark SQL 2.4 to 3.0
+
+ - Since Spark 3.0, the `from_json` functions supports two modes -
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json`
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing
of malformed JSON records. For example, the JSON string `{"a" 1}` with the
schema `a INT` is converted to `null` by previous versions but Spark 3.0
converts it to `Row(null)`. In version 2.4 and earlier, arrays of JSON objects
are considered as invalid and converted to `null` if specified schema is
`StructType`. Since Spark 3.0, the input is considered as a valid JSON array
and only its first element is parsed if it conforms to the specified
`StructType`.
--- End diff --
This is the case when an user provided `StructType` schema but we observe
an `array` in JSON input. So, JSON datasource returns row per each struct in
the array. Currently `from_json` returns `null` in the case. With this PR,
`from_json` returns one row with the first element of input array.
Because of we cannot return multiple rows from a functions, so we have the
following options:
- return `null` but this will be the first case when for not `null` input
we return `null` (this current approach before the PR)
- return a row with one element from the input array (this PR proposes that)
- throw an exception which is not nice option in the `PERMISSIVE` mode
- throw `BadRecordException` internally, and return `Row(null, null, ...,
null)` in the `PERMISSIVE` mode or an exception in `FAILFAST`.
It seems the last option is more attractive than other. WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]