sarutak edited a comment on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-689860782
@c21 Thanks for the comment. > Users can choose to remove these repartitionByRange/orderBy in query by themselves to save the shuffle/sort, as they are not necessary to add. Yes, user can choose it but it requires users to understand how Spark and Spark SQL work internally and some distributed computing knowledge beforehand. Also, if a data processing logic or query is very complex, it will be difficult for users to judge which repartition operations can be removed. Should Spark hide complexity for users? > E.g. we can have more complicated case if user don't do the right thing: spark.range(1, 100).repartitionByRange(10, $"id".desc).repartitionByRange(10, $"id").orderBy($"id"), should we also handle these cases? Actually, this case is already handled by `CollapseRepartition` rule. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
