Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/8180#discussion_r38916643
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ActiveJob.scala ---
@@ -23,18 +23,42 @@ import org.apache.spark.TaskContext
import org.apache.spark.util.CallSite
/**
- * Tracks information about an active job in the DAGScheduler.
+ * A running job in the DAGScheduler. Jobs can be of two types: a result
job, which computes a
+ * ResultStage to execute an action, or a map-stage job, which computes
the map outputs for a
+ * ShuffleMapStage before any downstream stages are submitted. The latter
is used for adaptive
+ * query planning, to look at map output statistics before submitting
later stages. We distinguish
+ * between these two types of jobs using the finalStage field of this
class.
+ *
+ * Jobs are only tracked for "leaf" stages that clients directly
submitted, through DAGScheduler's
+ * submitJob or submitMapStage methods. However, either type of job may
cause the execution of
+ * may other earlier stages (for RDDs in the DAG it depends on), and
multiple jobs may share some
--- End diff --
nit: `may` is redundant
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]