[GitHub] [spark] sarutak edited a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

GitBox Wed, 09 Sep 2020 16:06:23 -0700


sarutak edited a comment on pull request #29677:
URL: https://github.com/apache/spark/pull/29677#issuecomment-689860782



   @c21 Thanks for the comment.
   > Users can choose to remove these repartitionByRange/orderBy in query by 
themselves to save the shuffle/sort, as they are not necessary to add.
   
   Yes, user can choose it but it requires users to understand how Spark and 
Spark SQL work internally and some distributed computing knowledge beforehand.
   Also, if a data processing logic or query is very complex, it will be 
difficult for users to judge which repartition operations can be removed.
   Should Spark hide complexity for users?
   
   > E.g. we can have more complicated case if user don't do the right thing: 
spark.range(1, 100).repartitionByRange(10, $"id".desc).repartitionByRange(10, 
$"id").orderBy($"id"), should we also handle these cases?
   
   Actually, this case is already handled by `CollapseRepartition` rule.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sarutak edited a comment on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements

Reply via email to