[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

cloud-fan Tue, 25 Sep 2018 18:13:15 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22237#discussion_r220401132
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1877,6 +1877,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
     
     # Migration Guide
     
    +## Upgrading From Spark SQL 2.4 to 3.0
    +
    +  - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`. In version 2.4 and earlier, arrays of JSON objects 
are considered as invalid and converted to `null` if specified schema is 
`StructType`. Since Spark 3.0, the input is considered as a valid JSON array 
and only its first element is parsed if it conforms to the specified 
`StructType`.
    +
    --- End diff --
    
    > In previous versions, behavior of `from_json` did not conform to either 
`PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records.
    
    Do we have a clear definition of the current behavior? It's important to 
let user know how the behavior changes.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

Reply via email to