[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

GitBox Thu, 05 May 2022 10:44:37 -0700


mridulm commented on code in PR #36162:
URL: https://github.com/apache/spark/pull/36162#discussion_r866163541



##########
core/src/main/scala/org/apache/spark/SparkStatusTracker.scala:
##########
@@ -120,4 +120,8 @@ class SparkStatusTracker private[spark] (sc: SparkContext, 
store: AppStatusStore
         exec.memoryMetrics.map(_.totalOnHeapStorageMemory).getOrElse(0L))
     }.toArray
   }
+
+  def getAppStatusStore: AppStatusStore = {
+    store
+  }

Review Comment:
   > I think this essentially means we'll have intermediate accumulables for 
TaskInfo rather than only final accumulables for the completed tasks as what we 
have today
   
   Materializing the subset of required values was an optimization to this - 
since `_accumulables` is a `Seq` and the scan would be done repeatedly (we only 
need a small subset of input/shuffle related metrics to determine progress, 
while the total set can be fairly large).
   
   > And we'll have to track all tasks since the completed tasks were 
inprogress tasks ever.
   
   For completed tasks, we are already tracking this.
   For in-progress tasks, we are not - and will need to be added.
   For tasks which are yet to start, this would be empty.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

Reply via email to