GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/19734
[SPARK-22442][SQL][BRANCH-2.2] ScalaReflection should produce correct field
names for special characters
## What changes were proposed in this pull request?
For a class with field name of special characters, e.g.:
```scala
case class MyType(`field.1`: String, `field 2`: String)
```
Although we can manipulate DataFrame/Dataset, the field names are encoded:
```scala
scala> val df = Seq(MyType("a", "b"), MyType("c", "d")).toDF
df: org.apache.spark.sql.DataFrame = [field$u002E1: string, field$u00202:
string]
scala> df.as[MyType].collect
res7: Array[MyType] = Array(MyType(a,b), MyType(c,d))
```
It causes resolving problem when we try to convert the data with
non-encoded field names:
```scala
spark.read.json(path).as[MyType]
...
[info] org.apache.spark.sql.AnalysisException: cannot resolve
'`field$u002E1`' given input columns: [field 2, fie
ld.1];
[info] at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
...
```
We should use decoded field name in Dataset schema.
## How was this patch tested?
Added tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-22442-2.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19734.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19734
----
commit 8d3fd950ca76d791335b9000133d0b1f897d2f87
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-11-13T02:40:01Z
ScalaReflection should produce correct field names for special characters.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]