vanzin commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r335733398
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala
 ##########
 @@ -76,6 +109,29 @@ private[spark] class JobDataWrapper(
 
   @JsonIgnore @KVIndex("completionTime")
   private def completionTime: Long = 
info.completionTime.map(_.getTime).getOrElse(-1L)
+
+  def toLiveJob: LiveJob = {
 
 Review comment:
   I haven't fully followed the discussion here, not yet thought thoroughly 
about all the implications of the snapshotting approaches being discussed. But 
I just wanted to point out that the problems being talked about here 
(snapshotting a potentially large amount of data) are ridiculously worse on the 
SQL side.
   
   Because of the way SQL metrics are calculated you need to have all data 
points, meaning that a snapshot taken in the middle of a stage of a SQL query 
can have a humongous amount of data.
   
   The 100k task stage here could be encoded with 25k characters if you encode 
the bit set as hex characters (4 bits per char). That's not so bad. But on the 
SQL side, if you have let's say half of the tasks finished, you have to encode 
50k longs * number of metrics being tracked by the stage, which is a lot more 
data to encode in json.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to