[GitHub] [spark] bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

GitBox Wed, 08 Jan 2020 22:20:01 -0800

bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-572408308
 
 
   Thanks for taking a look!
   Yes, the reason it is here is because the shuffle/sorting is introduced by 
EnsureRequirements itself, making the user added sorts/shuffles unnecessary. 
   
   Yea it felt a little hacky for optimization code to be in a rule called 
EnsureRequirements.
   
   I'd like someone more familiar with overall planner design to suggest 
whether I go through with 1st or 2nd option.
   For 2nd option, won't I need to create a new physical node for both the 
repartition and sort, each of which is kinda a dummy physical node which relies 
on EnsureRequirements to add the necessary sorts/partitioning based on 
`requiredChildDistribution` and `requiredChildOrdering`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bmarcott edited a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

Reply via email to