GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21439
[SPARK-24391][SQL] Support arrays of any types by from_json
## What changes were proposed in this pull request?
The PR removes a restriction for element types of array type which exists
in `from_json` for the root type. Currently, the function can handle only
arrays of structs. Even array of primitive types is disallowed. The PR allows
arrays of any types currently supported by JSON datasource. Here is an example
of an array of a primitive type:
```
scala> import org.apache.spark.sql.functions._
scala> val df = Seq("[1, 2, 3]").toDF("a")
scala> val schema = new ArrayType(IntegerType, false)
scala> val arr = df.select(from_json($"a", schema))
scala> arr.printSchema
root
|-- jsontostructs(a): array (nullable = true)
| |-- element: integer (containsNull = true)
```
and result of converting of the json string to the `ArrayType`:
```
scala> arr.show
+----------------+
|jsontostructs(a)|
+----------------+
| [1, 2, 3]|
+----------------+
```
## How was this patch tested?
I added a few positive and negative tests:
- array of primitive types
- array of arrays
- array of structs
- array of maps
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 from_json-array
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21439.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21439
----
commit 3a7559b8809757ba32491a9c882ec40c8986c3b0
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-26T16:46:08Z
Support arrays by from_json
commit b601a9365e12744c408a8c198a94a5c6a9e4607e
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-05-26T17:08:49Z
Fix comments
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]