GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/18627
[BACKPORT-2.1][SPARK-19104][SQL] Lambda variables in ExternalMapToCatalyst
should be global
## What changes were proposed in this pull request?
This PR is backport of #18418 to Spark 2.1.
[SPARK-21393](https://issues.apache.org/jira/browse/SPARK-21393) reported this
problem in Spark 2.1.
The issue happens in `ExternalMapToCatalyst`. For example, the following
codes create ExternalMap`ExternalMapToCatalyst`ToCatalyst to convert Scala Map
to catalyst map format.
```
val data = Seq.tabulate(10)(i => NestedData(1, Map("key" ->
InnerData("name", i + 100))))
val ds = spark.createDataset(data)
```
The `valueConverter` in `ExternalMapToCatalyst` looks like:
```
if (isnull(lambdavariable(ExternalMapToCatalyst_value52,
ExternalMapToCatalyst_value_isNull52, ObjectType(class
org.apache.spark.sql.InnerData), true))) null else named_struct(name,
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType,
fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52,
ExternalMapToCatalyst_value_isNull52, ObjectType(class
org.apache.spark.sql.InnerData), true)).name, true), value,
assertnotnull(lambdavariable(ExternalMapToCatalyst_value52,
ExternalMapToCatalyst_value_isNull52, ObjectType(class
org.apache.spark.sql.InnerData), true)).value)
```
There is a `CreateNamedStruct` expression (`named_struct`) to create a row
of `InnerData.name` and `InnerData.value` that are referred by
`ExternalMapToCatalyst_value52`.
Because `ExternalMapToCatalyst_value52` are local variable, when
`CreateNamedStruct` splits expressions to individual functions, the local
variable can't be accessed anymore.
## How was this patch tested?
Added a new test suite into `DatasetPrimitiveSuite`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-21391
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18627.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18627
----
commit 5143d35dfc254d9a560123b6e7eb74a5b68a03a4
Author: Kazuaki Ishizaki <[email protected]>
Date: 2017-07-13T17:59:16Z
initial commit
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]