GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/18627

    [BACKPORT-2.1][SPARK-19104][SQL] Lambda variables in ExternalMapToCatalyst 
should be global

    ## What changes were proposed in this pull request?
    
    This PR is backport of #18418 to Spark 2.1. 
[SPARK-21393](https://issues.apache.org/jira/browse/SPARK-21393) reported this 
problem in Spark 2.1.
    
    The issue happens in `ExternalMapToCatalyst`. For example, the following 
codes create ExternalMap`ExternalMapToCatalyst`ToCatalyst to convert Scala Map 
to catalyst map format.
    
    ```
    val data = Seq.tabulate(10)(i => NestedData(1, Map("key" -> 
InnerData("name", i + 100))))
    val ds = spark.createDataset(data)
    ```
    The `valueConverter` in `ExternalMapToCatalyst` looks like:
    
    ```
    if (isnull(lambdavariable(ExternalMapToCatalyst_value52, 
ExternalMapToCatalyst_value_isNull52, ObjectType(class 
org.apache.spark.sql.InnerData), true))) null else named_struct(name, 
staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, 
ExternalMapToCatalyst_value_isNull52, ObjectType(class 
org.apache.spark.sql.InnerData), true)).name, true), value, 
assertnotnull(lambdavariable(ExternalMapToCatalyst_value52, 
ExternalMapToCatalyst_value_isNull52, ObjectType(class 
org.apache.spark.sql.InnerData), true)).value)
    ```
    There is a `CreateNamedStruct` expression (`named_struct`) to create a row 
of `InnerData.name` and `InnerData.value` that are referred by 
`ExternalMapToCatalyst_value52`.
    
    Because `ExternalMapToCatalyst_value52` are local variable, when 
`CreateNamedStruct` splits expressions to individual functions, the local 
variable can't be accessed anymore.
    
    ## How was this patch tested?
    
    Added a new test suite into `DatasetPrimitiveSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-21391

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18627.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18627
    
----
commit 5143d35dfc254d9a560123b6e7eb74a5b68a03a4
Author: Kazuaki Ishizaki <[email protected]>
Date:   2017-07-13T17:59:16Z

    initial commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to