Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/19194#discussion_r140296863
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -619,6 +625,47 @@ private[spark] class ExecutorAllocationManager(
// place the executors.
private val stageIdToExecutorPlacementHints = new mutable.HashMap[Int,
(Int, Map[String, Int])]
+ override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
+ jobStart.stageInfos.foreach(stageInfo =>
stageIdToJobId(stageInfo.stageId) = jobStart.jobId)
+
+ var jobGroupId = if (jobStart.properties != null) {
+ jobStart.properties.getProperty(SparkContext.SPARK_JOB_GROUP_ID)
+ } else {
+ null
+ }
+
+ val maxConTasks = if (jobGroupId != null &&
+ conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
+ conf.get(s"spark.job.$jobGroupId.maxConcurrentTasks").toInt
+ } else {
+ Int.MaxValue
+ }
+
+ if (maxConTasks <= 0) {
+ throw new IllegalArgumentException(
+ "Maximum Concurrent Tasks should be set greater than 0 for the
job to progress.")
+ }
+
+ if (jobGroupId == null ||
!conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
+ jobGroupId = DEFAULT_JOB_GROUP
+ }
+
+ jobIdToJobGroup(jobStart.jobId) = jobGroupId
+ if (!jobGroupToMaxConTasks.contains(jobGroupId)) {
--- End diff --
You could submit jobs concurrently, from two different threads. I didn't
describe it very well -- say both threads are in the same job group. Each
thread can only do one job at a time, but maybe between the two threads, there
is always some active job for the job group the entire time.
There are real use cases which are like this -- in "job-server" style
deployments, there is a long-running spark context (most likely with cached
RDDs), accepting requests from multiple users (perhaps sitting behind an http
server). Maybe some group of users are always put into one job group (eg.,
there is a low-priority group and a high priority group of users). You might
process more than one job for each group at a time, and there is a never ending
stream of jobs.
(again, its not the most common scenario, but might as well have it behave
correctly.)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]