Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9050#issuecomment-147116624 @hvanhovell Yes, we will backport it to 1.5 branch. So it will be fixed in 1.5.2. Let me explain the cause. Every attribute reference has an `exprId`. If you do not explicitly assign this id (probably most of cases) when you create an attribute reference, you will get a unique id (see https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L27). However, if we create attribute references in both driver and executors, the uniqueness of the exprId will not be held anymore. So, we can see two attribute references representing two different columns having the same ids. Because our attribute binding work relies on the uniqueness of the exprId, once this property does not hold anymore, we will bind to wrong columns when evaluating expressions and generate wrong results.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org