cloud-fan commented on pull request #31083: URL: https://github.com/apache/spark/pull/31083#issuecomment-758639451
My major concern is how to specify the required distribution/ordering in the query plan. The current PR chooses to insert the `RepartitionByExpression` operator in the optimizer phase, while all other operators in Spark use `SparkPlan.requiredChildDistribution/requiredChildOrdering` property to specify it and let the `EnsureRequirements` rule to add necessary shuffle/sort operators. I need more time to think about the tradeoffs between these two approaches, but probably it's better to follow the existing framework. We can add the `requiredChildDistribution/requiredChildOrdering` property to the v2 write commands and set them in the `V2Writes` rules. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
