GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22019
[SPARK-25040][SQL] Empty string for double and float types should be nulls
in JSON
## What changes were proposed in this pull request?
This PR proposes to treat empty strings for double and float types as
`null` consistently. Looks we mistakenly missed this corner case, which I guess
is not that serious since this looks happened betwen 1.x and 2.x, and pretty
corner case.
For an easy reproducer, in case of double, the code below raises an error:
```scala
spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""",
"""{"a": 1.1, "b": 1.1}""").toDS).show()
```
```scala
Caused by: java.lang.RuntimeException: Cannot parse as double.
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:163)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:152)
at
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152)
at
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$convertObject(JacksonParser.scala:312)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:71)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:70)
at
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:368)
at
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:363)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2491)
at
org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:363)
at
org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450)
at
org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450)
at
org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61)
... 24 more
```
Unlike other types:
```scala
spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""",
"""{"a": 1, "b": 1}""").toDS).show()
```
```
+----+----+
| a| b|
+----+----+
|null|null|
| 1| 1|
+----+----+
```
## How was this patch tested?
Unit tests were added and manually tested.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark double-float-empty
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22019.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22019
----
commit ef57fdd5b0a6f7f0b6343c91c6983d20bc67fb5b
Author: hyukjinkwon <gurwls223@...>
Date: 2018-08-07T05:23:43Z
Empty string for double and float types should be nulls in JSON
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]