[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

henryr Mon, 16 Apr 2018 11:24:51 -0700

Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21049#discussion_r181839715
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -307,6 +309,32 @@ object RemoveRedundantProject extends 
Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Remove [[Sort]] in subqueries that do not affect the set of rows 
produced, only their
    + * order. Subqueries produce unordered sets of rows so sorting their 
output is unnecessary.
    + */
    +object RemoveSubquerySorts extends Rule[LogicalPlan] {
    +
    +  /**
    +   * Removes all [[Sort]] operators from a plan that are accessible from 
the root operator via
    +   * 0 or more [[Project]], [[Filter]] or [[View]] operators.
    +   */
    +  private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = {
    +    plan match {
    +      case Sort(_, _, child) => removeTopLevelSorts(child)
    +      case Project(fields, child) => Project(fields, 
removeTopLevelSorts(child))
    +      case Filter(condition, child) => Filter(condition, 
removeTopLevelSorts(child))
    +      case View(tbl, output, child) => View(tbl, output, 
removeTopLevelSorts(child))
    +      case _ => plan
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case Subquery(child) => Subquery(removeTopLevelSorts(child))
    +    case SubqueryAlias(name, child) => SubqueryAlias(name, 
removeTopLevelSorts(child))
    --- End diff --
    
    Yep, that's why I added the new rule just before `EliminateSubqueryAliases` 
(which runs in the optimizer, as part of the 'finish analysis' batch). After 
`EliminateSubqueryAliases` there doesn't seem to be any way to detect 
subqueries.
    
    Another approach I suppose would be to handle this like `SparkPlan`'s 
`requiredChildOrdering` - if a parent doesn't require any ordering of the 
child, (and the child is a `Sort` node), the child `Sort` should be dropped. 
That seems like a more fundamental change though.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

Reply via email to