[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

maryannxue Fri, 13 Oct 2017 05:48:08 -0700

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19488#discussion_r144543377
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
    @@ -205,14 +205,15 @@ object PhysicalAggregation {
         case logical.Aggregate(groupingExpressions, resultExpressions, child) 
=>
           // A single aggregate expression might appear multiple times in 
resultExpressions.
           // In order to avoid evaluating an individual aggregate function 
multiple times, we'll
    -      // build a set of the distinct aggregate expressions and build a 
function which can
    +      // build a map of the distinct aggregate expressions and build a 
function which can
           // be used to re-write expressions so that they reference the single 
copy of the
           // aggregate function which actually gets computed.
    -      val aggregateExpressions = resultExpressions.flatMap { expr =>
    +      val aggregateExpressionMap = resultExpressions.flatMap { expr =>
             expr.collect {
    -          case agg: AggregateExpression => agg
    +          case agg: AggregateExpression => (agg.canonicalized, 
agg.deterministic) -> agg
    --- End diff --
    
    Thank you, @cloud-fan! You've made a very good point! We should address 
non-deterministic correctly here. I have confirmed, though, that all aggregate 
functions should be deterministic (and fortunately Spark already guarantees 
this in the analysis stage). So I think we are safe to use "agg.canonicalized" 
without considering the "deterministic" field here, and so as not to be 
confusing. I have also added a code comment addressing this point in my latest 
commit.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19488: [SPARK-22266][SQL] The same aggregate function wa...

Reply via email to