Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/15032#discussion_r78265127
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala
---
@@ -181,11 +182,7 @@ abstract class AggregationIterator(
// Process all expression-based aggregate functions.
updateProjection.target(currentBuffer)(joinedRow(currentBuffer,
row))
// Process all imperative aggregate functions.
- var i = 0
- while (i < updateFunctions.length) {
- updateFunctions(i)(currentBuffer, row)
- i += 1
- }
+ updateFunctions.foreach(updateFunction =>
updateFunction(currentBuffer, row))
--- End diff --
I just wonder if it'd make sense if we cache the length and use just a
normal `while` loop as done above. This seems not encouraged [1]. I believe
this is fine but not for critical path. IIUC, this will , at least, introduce
virtual function calls for each row. I found a blog just to be verbose to make
sure myself [2]. I am worried because this change can be a example in the
future. Maybe, we need a clear reason to change this.
[1]
https://github.com/databricks/scala-style-guide#traversal-and-zipwithindex
[2] http://www.scalablescala.com/roller/scala/entry/why_is_using_for_foreach
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]