GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/11632

    [SPARK-13801][SQL] DataFrame.col should return unresolved attribute

    ## What changes were proposed in this pull request?
    
    Let's start with an example:
    ```
    val df = ...
    val df2 = df.filter(...)
    df.join(df2, (df("key") + 1) === df2("key"))
    ```
    This query won't work as `df("key")` and `df2("key")` reference to a same 
column.
    
    I think the biggest problem is, we give users the resolved attribute. 
However, resolved attribute is not real column, as logical plan's output may 
change. For example, we will generate new output for the right child in 
self-join.
    
    We should make `DataFrame.col` return unresolved attribute, and still do 
the resolution to make sure the given column name is resolvable, but don't 
return the resolved one, just get the name out and wrap it with 
UnresolvedAttribute.
    
    Now if users run the query again, they will get analysis exception, and 
they will understand they need to alias df and df2 before join.
    
    ## How was this patch tested?
    
    existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark df-self-join

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11632.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11632
    
----
commit 9642324601e18f2ce0095a40f70403c4a7f4b274
Author: Wenchen Fan <[email protected]>
Date:   2016-03-10T10:51:47Z

    DataFrame.col should return unresolved attribute

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to