cloud-fan commented on pull request #30829: URL: https://github.com/apache/spark/pull/30829#issuecomment-763355925
`OptimizeSkewJoin` already runs `EnsureRequirements` inside it, the only change we need is to not give up the optimization even if extra shuffles are added. Query stage is quite self-contained. `queryStagePreparationRules` can't see the query plan of query stages. During re-optimization, query stage becomes a leaf node `LogicalQueryStage`. One problem is, `queryStageOptimizerRules` can't assume that query stage doesn't have shuffles in the middle. We need to revisit these rules. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
