GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/20648
[SPARK-23448][SQL] JSON parser should return partial row when part of columns are failed to parse under PermissiveMode ## What changes were proposed in this pull request? When we read JSON document with corrupted field under `PermissiveMode`: ```json {"attr1":"val1","attr2":"[\"val2\"]"} {"attr1":"val1","attr2":["val2"]} ``` ```scala val schema = StructType( Seq(StructField("attr1", StringType, true), StructField("attr2", ArrayType(StringType, true), true))) spark.read.schema(schema).json(input).collect().foreach(println) ``` We get this results currently: ``` [null,null] [val1,WrappedArray(val2)] ``` From `FailureSafeParser` and `BadRecordException`, seems there is the intention to return partial result for corrupted record. But the current implementation doesn't actually return partial result at all. As above example shows, all columns are null. This patch tries to fill the gap and returns partial result. ## How was this patch tested? Pass added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-23448 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20648.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20648 ---- commit 3d7d0415f2bfc2274fe94636b222d1ee437b0d24 Author: Liang-Chi Hsieh <viirya@...> Date: 2018-02-20T14:03:49Z Returns partial row when part of columns are failed to parse. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org