[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

cloud-fan Thu, 31 May 2018 07:46:12 -0700

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21449
  
    This is a long-standing issue, I've seen many attempts to fix it (including 
myself) but no one success.
    
    The major problem is, there is no clear definition of the expected 
behavior, i.e. what's the semantic of `Dataset.col`?
    
    some examples
    ```
    df.select(df.col("i")) // valid
    
    val df1 = df.filter(...)
    df1.select(df.col("i")) // still valid
    
    df.join(otherDF, df.col("i") === otherDF.col("i")) // valid
    
    df.join(otherDF).select(df.col("i"), otherDF("i"))  // valid
    
    val df2 = df.select(df.col("i") + 1)
    df2.select(df.col("i"))   // invalid
    ```
    
    Sometime we can use an ancestor's column in a new Dataset but sometimes we 
can't. We should make the condition clear first.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

Reply via email to