Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9050#issuecomment-147116624
@hvanhovell Yes, we will backport it to 1.5 branch. So it will be fixed in
1.5.2.
Let me explain the cause. Every attribute reference has an `exprId`. If you
do not explicitly assign this id (probably most of cases) when you create an
attribute reference, you will get a unique id (see
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L27).
However, if we create attribute references in both driver and executors, the
uniqueness of the exprId will not be held anymore. So, we can see two attribute
references representing two different columns having the same ids. Because our
attribute binding work relies on the uniqueness of the exprId, once this
property does not hold anymore, we will bind to wrong columns when evaluating
expressions and generate wrong results.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]