chasingegg commented on pull request #35168: URL: https://github.com/apache/spark/pull/35168#issuecomment-1012124307
> Yes this is a bug. Can you explain more about how this bug happens? Maybe adding alias is not the best way to fix it. @cloud-fan The way of adding alias is to regard the same attribute as different exprId, Presto did the same way, so I use the solution. I have spent some time debugging the behavior why it would cause problems while the first child of union has duplicate attributes. One part is in the optimization rule called PushProjectionThroughUnion,  In the buildReWrites, it will construct a map between the first child of union to the second child, for example, the first child has duplicate columns like **a, a**, it is completely the same, and the second child fetch **c and d** column, it will generate the map with **a -> d**,making the second child's outputs are all the same as **d column**. But After I remove the rule, the behavior is that the second child's outputs are all the same as **c column**... I would need to take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
