vanzin commented on a change in pull request #25943:
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r335733398
##########
File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala
##########
@@ -76,6 +109,29 @@ private[spark] class JobDataWrapper(
@JsonIgnore @KVIndex("completionTime")
private def completionTime: Long =
info.completionTime.map(_.getTime).getOrElse(-1L)
+
+ def toLiveJob: LiveJob = {
Review comment:
I haven't fully followed the discussion here, not yet thought thoroughly
about all the implications of the snapshotting approaches being discussed. But
I just wanted to point out that the problems being talked about here
(snapshotting a potentially large amount of data) are ridiculously worse on the
SQL side.
Because of the way SQL metrics are calculated you need to have all data
points, meaning that a snapshot taken in the middle of a stage of a SQL query
can have a humongous amount of data.
The 100k task stage here could be encoded with 25k characters if you encode
the bit set as hex characters (4 bits per char). That's not so bad. But on the
SQL side, if you have let's say half of the tasks finished, you have to encode
50k longs * number of metrics being tracked by the stage, which is a lot more
data to encode in json.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]