Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

via GitHub Sat, 25 May 2024 16:15:19 -0700


monkeyboy123 commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1614941974



##########
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##########
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
    */
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
     val tm = new TaskMetrics
-    for (acc <- accums) {
+    val (innerAccums, externalAccums) = accums.
+      partition(t => t.name.isDefined && tm.nameToAccums.contains(t.name.get))

Review Comment:
   > Do we do a lot of intermediate collections allocations in this 
`.partition` operation itself? e.g. under the hood, would we be resizing an 
array builder or doing a bunch of link
   
   Good suggestion,  as I found that  it costs much time in  fromAccumulators 
function in some scenarios  (as Flame Graphs shows)
   
![image](https://github.com/apache/spark/assets/9074114/4dbb2d15-7d3e-47ad-bb05-2f83e8e7771b)
   .Also we do a test offline  `tm._externalAccums.addAll` VS 
`tm._externalAccums.add`, the first one  speed up the compution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

Reply via email to