Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9050#issuecomment-147116624
  
    @hvanhovell Yes, we will backport it to 1.5 branch. So it will be fixed in 
1.5.2.
    
    Let me explain the cause. Every attribute reference has an `exprId`. If you 
do not explicitly assign this id (probably most of cases) when you create an 
attribute reference, you will get a unique id (see 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L27).
 However, if we create attribute references in both driver and executors, the 
uniqueness of the exprId will not be held anymore. So, we can see two attribute 
references representing two different columns having the same ids. Because our 
attribute binding work relies on the uniqueness of the exprId, once this 
property does not hold anymore, we will bind to wrong columns when evaluating 
expressions and generate wrong results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to