chasingegg edited a comment on pull request #35168:
URL: https://github.com/apache/spark/pull/35168#issuecomment-1012124307


   > Yes this is a bug. Can you explain more about how this bug happens? Maybe 
adding alias is not the best way to fix it.
   
   @cloud-fan The way of adding alias is to regard the same attribute as 
different exprId, Presto did the same way, so I use the solution. I have spent 
some time debugging the behavior why it would cause problems while the first 
child of union has  duplicate attributes. And I provide a more concise example 
in description.
   One part is in the optimization rule called PushProjectionThroughUnion,
   
![image](https://user-images.githubusercontent.com/18375889/149335539-bacbd938-da57-4dcf-8367-25d37ee5ae52.png)
   
   In the buildReWrites, it will construct a map between the first child of 
union to the second child, for example,  the first child has duplicate columns 
like **a, a**, it is completely the same, and the second child fetch **c and 
d** column, it will generate the map with **a -> d**,making the second child's 
outputs are all the same as **d column**.
   But After I remove the rule, the behavior is that the second child's outputs 
are all the same as **c column**... I would need to take another look.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to