[GitHub] spark pull request: [SPARK-9251][SQL] do not order by expressions ...

marmbrus Wed, 29 Jul 2015 12:49:43 -0700

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7593#discussion_r35803343
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -947,6 +948,63 @@ class Analyzer(
             Project(p.output, newPlan.withNewChildren(newChild :: Nil))
         }
       }
    +
    +  /**
    +   * Removes all still-need-evaluate ordering expressions from sort and 
use an inner project to
    +   * materialize them, finally use a outer project to project them away to 
keep the result same.
    +   * Then we can make sure we only sort by [[AttributeReference]]s.
    +   *
    +   * As an example,
    +   * {{{
    +   *   Sort('a, 'b + 1,
    +   *     Relation('a, 'b))
    +   * }}}
    +   * will be turned into:
    +   * {{{
    +   *   Project('a, 'b,
    +   *     Sort('a, '_sortCondition,
    +   *       Project('a, 'b, ('b + 1).as("_sortCondition"),
    +   *         Relation('a, 'b))))
    +   * }}}
    +   */
    +  object RemoveEvaluationFromSort extends Rule[LogicalPlan] {
    +    private def hasAlias(expr: Expression) = {
    +      expr.find {
    +        case a: Alias => true
    +        case _ => false
    +      }.isDefined
    +    }
    +
    +    override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    +      // The ordering expressions have no effect to the output schema of 
`Sort`,
    +      // so `Alias`s in ordering expressions are unnecessary and we should 
remove them.
    +      case s @ Sort(ordering, _, _) if ordering.exists(hasAlias) =>
    +        val newOrdering = ordering.map(_.transformUp {
    +          case Alias(child, _) => child
    +        }.asInstanceOf[SortOrder])
    +        s.copy(order = newOrdering)
    +
    +      case s @ Sort(ordering, global, child)
    +        if s.expressions.forall(_.resolved) && s.childrenResolved && 
!s.hasNoEvaluation =>
    +
    +        val (ref, needEval) = 
ordering.partition(_.child.isInstanceOf[AttributeReference])
    +
    +        val namedExpr = needEval.map(_.child match {
    +          case n: NamedExpression => n
    +          case e => Alias(e, "_sortCondition")()
    +        })
    +
    +        val newOrdering = ref ++ needEval.zip(namedExpr).map { case 
(order, ne) =>
    --- End diff --
    
    I think there is a bug here.  We need to consider the original ordering of 
the expressions or we might sort differently.  Consider:
    
    ```scala
    Seq((1,2,3,4)).toDF("a", "b", "c", "d").registerTempTable("test")
    
    scala> sqlContext.sql("SELECT * FROM test ORDER BY a + 1, b").queryExecution
    == Parsed Logical Plan ==
    'Sort [('a + 1) ASC,'b ASC], true
     'Project [unresolvedalias(*)]
      'UnresolvedRelation [test], None
    
    == Optimized Logical Plan ==
    Project [a#4,b#5,c#6,d#7]
     Sort [b#5 ASC,_sortCondition#9 ASC], true
      LocalRelation [a#4,b#5,c#6,d#7,_sortCondition#9], [[1,2,3,4,2]]
    ```
    
    Notice how we are now sorting by `b` before `a + 1`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-9251][SQL] do not order by expressions ...

Reply via email to