Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/9548#issuecomment-155226523
@marmbrus Thank you for your suggestions!
That is also like my initial idea. I did a try last night. Unfortunately, I
hit a problem when adding such a field to `Column` API. In the current design,
the class `Column` corresponds to the class `Expression`, which includes both
`AttributeReference` and the other types. For `Column`, it makes sense to have
such a dataFrame identifier. However, when `Column` is generated from the
binary expression types (e.g., `gt`), it could have more than one dataFrame
identifiers. Does that sound good to you?
When implementing the idea, it becomes more difficult. For example, in the
following binary operators,
```scala
def === (other: Any): Column = {
val right = lit(other).expr
EqualTo(expr, right)
}
```
`EqualTo` is an `Expression`. `expr` and `right` are not `Column`s. Thus,
when accessing the `Column` generated from `===`, we are unable to know the
dataframe sources of `expr` and `right` if we do not change
`AttributeReference`.
That is why I am thinking this could mean a major code change to
`DataFrame` and `Column`. Thank you for any further suggestion.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]