GitHub user viirya opened a pull request:

    [SPARK-24781][SQL] Using a reference from Dataset in Filter/Sort might not 

    ## What changes were proposed in this pull request?
    When we use a reference from Dataset in filter or sort, which was not used 
in the prior select, an AnalysisException occurs, e.g.,
    val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")"name")).filter(df("id") === 0).show()
    org.apache.spark.sql.AnalysisException: Resolved attribute(s) id#6 missing 
from name#5 in operator !Filter (id#6 = 0).;;
    !Filter (id#6 = 0)
       +- AnalysisBarrier
          +- Project [name#5]
             +- Project [_1#2 AS name#5, _2#3 AS id#6]
                +- LocalRelation [_1#2, _2#3]
    This change adds a condition `missingInput.isEmpty` to `resolved` of 
`LogicalPlan`. Previously a logical plan is resolved if all expressions are 
resolved and its children are resolved. However, as we possibly add a resolved 
reference like `df("name")` into a query plan, it is possible that all 
expressions in a query plan are resolved but have missing inputs.
    ## How was this patch tested?
    Added tests.

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-24781

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21745
commit 97837a46b790ceb1f0df38cc7a3094b1cb4eb556
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-07-11T07:44:43Z

    Resolved references from Dataset should be checked if it is missed from 



To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to