Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21049#discussion_r180964957
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -307,6 +309,32 @@ object RemoveRedundantProject extends
Rule[LogicalPlan] {
}
}
+/**
+ * Remove [[Sort]] in subqueries that do not affect the set of rows
produced, only their
+ * order. Subqueries produce unordered sets of rows so sorting their
output is unnecessary.
+ */
+object RemoveSubquerySorts extends Rule[LogicalPlan] {
+
+ /**
+ * Removes all [[Sort]] operators from a plan that are accessible from
the root operator via
+ * 0 or more [[Project]], [[Filter]] or [[View]] operators.
+ */
+ private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = {
+ plan match {
+ case Sort(_, _, child) => removeTopLevelSorts(child)
+ case Project(fields, child) => Project(fields,
removeTopLevelSorts(child))
+ case Filter(condition, child) => Filter(condition,
removeTopLevelSorts(child))
+ case View(tbl, output, child) => View(tbl, output,
removeTopLevelSorts(child))
+ case _ => plan
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+ case Subquery(child) => Subquery(removeTopLevelSorts(child))
+ case SubqueryAlias(name, child) => SubqueryAlias(name,
removeTopLevelSorts(child))
--- End diff --
Thanks! I've been trying to understand the role of `Subquery` and
`SubqueryAlias`. My confusion is that subqueries do seem to get planned as
`SubqueryAlias` operators, e.g.:
scala> spark.sql("SELECT count(*) from (SELECT id FROM dft ORDER BY
id)").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias('count(1), None)]
+- 'SubqueryAlias __auto_generated_subquery_name
+- 'Sort ['id ASC NULLS FIRST], true
+- 'Project ['id]
+- 'UnresolvedRelation `dft`
In the example you give I (personally) think it's still reasonable to drop
the ordering, but understand that might surprise some users. It wouldn't be
hard to skip the root if it's a subquery - but what do you propose for
detecting subqueries if my method isn't right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]