shahidki31 commented on a change in pull request #26508: [SPARK-26260][Core]For
disk store tasks summary table should show only successful tasks summary
URL: https://github.com/apache/spark/pull/26508#discussion_r346430709
##########
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##########
@@ -136,12 +136,6 @@ private[spark] class AppStatusStore(
store.read(classOf[StageDataWrapper], Array(stageId,
stageAttemptId)).locality
}
- // SPARK-26119: we only want to consider successful tasks when calculating
the metrics summary,
Review comment:
Also I checked performance analysis with 1 lac tasks
1. bin/spark-shell
`sc.parallelize(1 to 100000, 100000).count()`
Time taking to load the stage page:
1. InMemory store (both live and History): ~ 8-9 sec (before and after the
PR)
2. DiskStore -> ~9 sec (without PR but incorrect results if non successful
tasks)
~14 sec (with PR but correct results)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]