[GitHub] spark pull request #15032: [SPARK-17480][SQL] Improve performance by removin...

HyukjinKwon Fri, 09 Sep 2016 17:00:31 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15032#discussion_r78265127
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
 ---
    @@ -181,11 +182,7 @@ abstract class AggregationIterator(
             // Process all expression-based aggregate functions.
             updateProjection.target(currentBuffer)(joinedRow(currentBuffer, 
row))
             // Process all imperative aggregate functions.
    -        var i = 0
    -        while (i < updateFunctions.length) {
    -          updateFunctions(i)(currentBuffer, row)
    -          i += 1
    -        }
    +        updateFunctions.foreach(updateFunction => 
updateFunction(currentBuffer, row))
    --- End diff --
    
    I just wonder if it'd make sense if we cache the length and use just a 
normal `while` loop as done above. This seems not encouraged [1]. I believe 
this is fine but not for critical path. IIUC, this will , at least, introduce 
virtual function calls for each row. I found a blog just to be verbose to make 
sure myself [2]. I am worried because this change can be a example in the 
future. Maybe, we need a clear reason to change this.
    
    [1] 
https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex
    [2] http://www.scalablescala.com/roller/scala/entry/why_is_using_for_foreach



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15032: [SPARK-17480][SQL] Improve performance by removin...

Reply via email to