[GitHub] [spark] cloud-fan commented on pull request #31083: [SPARK-34026][SQL] Inject repartition and sort nodes to satisfy required distribution and ordering

GitBox Tue, 12 Jan 2021 04:59:53 -0800


cloud-fan commented on pull request #31083:
URL: https://github.com/apache/spark/pull/31083#issuecomment-758639451



   My major concern is how to specify the required distribution/ordering in the 
query plan. The current PR chooses to insert the `RepartitionByExpression` 
operator in the optimizer phase, while all other operators in Spark use 
`SparkPlan.requiredChildDistribution/requiredChildOrdering` property to specify 
it and let the `EnsureRequirements` rule to add necessary shuffle/sort 
operators.
   
   I need more time to think about the tradeoffs between these two approaches, 
but probably it's better to follow the existing framework. We can add the 
`requiredChildDistribution/requiredChildOrdering` property to the v2 write 
commands and set them in the `V2Writes` rules.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #31083: [SPARK-34026][SQL] Inject repartition and sort nodes to satisfy required distribution and ordering

Reply via email to