[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

viirya Wed, 11 Jul 2018 02:05:19 -0700

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/21745


    [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not 
work

    ## What changes were proposed in this pull request?
    
    When we use a reference from Dataset in filter or sort, which was not used 
in the prior select, an AnalysisException occurs, e.g.,
    
    ```scala
    val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
    df.select(df("name")).filter(df("id") === 0).show()
    ```
    
    ```scala
    org.apache.spark.sql.AnalysisException: Resolved attribute(s) id#6 missing 
from name#5 in operator !Filter (id#6 = 0).;;
    !Filter (id#6 = 0)
       +- AnalysisBarrier
          +- Project [name#5]
             +- Project [_1#2 AS name#5, _2#3 AS id#6]
                +- LocalRelation [_1#2, _2#3]
    ```
    
    This change adds a condition `missingInput.isEmpty` to `resolved` of 
`LogicalPlan`. Previously a logical plan is resolved if all expressions are 
resolved and its children are resolved. However, as we possibly add a resolved 
reference like `df("name")` into a query plan, it is possible that all 
expressions in a query plan are resolved but have missing inputs.
    
    ## How was this patch tested?
    
    Added tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-24781

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21745
    
----
commit 97837a46b790ceb1f0df38cc7a3094b1cb4eb556
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-07-11T07:44:43Z

    Resolved references from Dataset should be checked if it is missed from 
plan.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

Reply via email to