[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

gatorsmile Mon, 09 Nov 2015 15:02:06 -0800

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/9548#issuecomment-155226523
  
    @marmbrus Thank you for your suggestions! 
    
    That is also like my initial idea. I did a try last night. Unfortunately, I 
hit a problem when adding such a field to `Column` API. In the current design, 
the class `Column` corresponds to the class `Expression`, which includes both 
`AttributeReference` and the other types. For `Column`, it makes sense to have 
such a dataFrame identifier. However, when `Column` is generated from the 
binary expression types (e.g., `gt`), it could have more than one dataFrame 
identifiers. Does that sound good to you? 
    
    When implementing the idea, it becomes more difficult. For example, in the 
following binary operators,
    
    ```scala
      def === (other: Any): Column = {
        val right = lit(other).expr
        EqualTo(expr, right)
      }
    ```
    
    `EqualTo` is an `Expression`. `expr` and `right` are not `Column`s. Thus, 
when accessing the `Column` generated from `===`, we are unable to know the 
dataframe sources of `expr` and `right` if we do not change 
`AttributeReference`.  
    
    That is why I am thinking this could mean a major code change to 
`DataFrame` and `Column`. Thank you for any further suggestion.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10838][SPARK-11576][SQL][WIP] Incorrect...

Reply via email to