[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

gatorsmile Mon, 16 Apr 2018 09:29:40 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21049#discussion_r181803393
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
    @@ -307,6 +309,32 @@ object RemoveRedundantProject extends 
Rule[LogicalPlan] {
       }
     }
     
    +/**
    + * Remove [[Sort]] in subqueries that do not affect the set of rows 
produced, only their
    + * order. Subqueries produce unordered sets of rows so sorting their 
output is unnecessary.
    + */
    +object RemoveSubquerySorts extends Rule[LogicalPlan] {
    +
    +  /**
    +   * Removes all [[Sort]] operators from a plan that are accessible from 
the root operator via
    +   * 0 or more [[Project]], [[Filter]] or [[View]] operators.
    +   */
    +  private def removeTopLevelSorts(plan: LogicalPlan): LogicalPlan = {
    +    plan match {
    +      case Sort(_, _, child) => removeTopLevelSorts(child)
    +      case Project(fields, child) => Project(fields, 
removeTopLevelSorts(child))
    +      case Filter(condition, child) => Filter(condition, 
removeTopLevelSorts(child))
    +      case View(tbl, output, child) => View(tbl, output, 
removeTopLevelSorts(child))
    +      case _ => plan
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +    case Subquery(child) => Subquery(removeTopLevelSorts(child))
    +    case SubqueryAlias(name, child) => SubqueryAlias(name, 
removeTopLevelSorts(child))
    --- End diff --
    
    ```Seq((1, 2, "1"), (3, 4, "3")).toDF("int", "int2", 
"str_sort").orderBy('int.asc).as('df1)```
    
    Before entering optimizer, we get rid of `SubqueryAlias` by the rule 
`EliminateSubqueryAliases`. Basically, it is no-op after query analysis. The 
name is a little bit confusing, I have to admit.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...

Reply via email to