Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21049#discussion_r180964957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -307,6 +309,32 @@ object RemoveRedundantProject extends Rule[LogicalPlan] { } } +/** + * Remove [[Sort]] in subqueries that do not affect the set of rows produced, only their + * order. Subqueries produce unordered sets of rows so sorting their output is unnecessary. + */ +object RemoveSubquerySorts extends Rule[LogicalPlan] { + + /** + * Removes all [[Sort]] operators from a plan that are accessible from the root operator via + * 0 or more [[Project]], [[Filter]] or [[View]] operators. + */ + private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = { + plan match { + case Sort(_, _, child) => removeTopLevelSorts(child) + case Project(fields, child) => Project(fields, removeTopLevelSorts(child)) + case Filter(condition, child) => Filter(condition, removeTopLevelSorts(child)) + case View(tbl, output, child) => View(tbl, output, removeTopLevelSorts(child)) + case _ => plan + } + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case Subquery(child) => Subquery(removeTopLevelSorts(child)) + case SubqueryAlias(name, child) => SubqueryAlias(name, removeTopLevelSorts(child)) --- End diff -- Thanks! I've been trying to understand the role of `Subquery` and `SubqueryAlias`. My confusion is that subqueries do seem to get planned as `SubqueryAlias` operators, e.g.: scala> spark.sql("SELECT count(*) from (SELECT id FROM dft ORDER BY id)").explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('count(1), None)] +- 'SubqueryAlias __auto_generated_subquery_name +- 'Sort ['id ASC NULLS FIRST], true +- 'Project ['id] +- 'UnresolvedRelation `dft` In the example you give I (personally) think it's still reasonable to drop the ordering, but understand that might surprise some users. It wouldn't be hard to skip the root if it's a subquery - but what do you propose for detecting subqueries if my method isn't right?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org