Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2696#discussion_r18864057 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -853,6 +873,12 @@ class SparkContext(config: SparkConf) extends Logging { /** The version of Spark on which this application is running. */ def version = SPARK_VERSION + def getJobsIdsForGroup(jobGroup: String): Array[Int] = statusApi.jobIdsForGroup(jobGroup) + + def getJobInfo(jobId: Int): Option[SparkJobInfo] = statusApi.newJobInfo(jobId) --- End diff -- The garbage collection / data retention semantics are the same as what's displayed in the Spark web UI, since we're built on top of the same listeners. While a job is active, we'll keep information on it. After the job completes / fails, a configurable maximum number of jobs and stages will be retained. I'll be sure to clearly document this. Regarding snapshots / consistency, I added a note about this in one of my commit messages, reproduced here: ``` - The "consistent snapshot of the entire job -> stage -> task mapping" semantics might be very expensive to implement for large jobs, so I've decided to remove chaining between SparkJobInfo and SparkStageInfo interfaces. Concretely, this means that you can't write something like job.stages()(0).name to get the name of the first stage in a job. Instead, you have to explicitly get the stage's ID from the job and then look up that stage using sc.getStageInfo(). This isn't to say that we can't implement methods like "getNumActiveStages" that reflect consistent state; the goal is mainly to avoid spending lots of time / memory to construct huge object graphs. ``` My concern was that it may be expensive to snapshot large jobs with many stages and tasks.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org