Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19194#discussion_r140332294
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -619,6 +625,47 @@ private[spark] class ExecutorAllocationManager(
         // place the executors.
         private val stageIdToExecutorPlacementHints = new mutable.HashMap[Int, 
(Int, Map[String, Int])]
     
    +    override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
    +      jobStart.stageInfos.foreach(stageInfo => 
stageIdToJobId(stageInfo.stageId) = jobStart.jobId)
    +
    +      var jobGroupId = if (jobStart.properties != null) {
    +        jobStart.properties.getProperty(SparkContext.SPARK_JOB_GROUP_ID)
    +      } else {
    +        null
    +      }
    +
    +      val maxConTasks = if (jobGroupId != null &&
    +        conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        conf.get(s"spark.job.$jobGroupId.maxConcurrentTasks").toInt
    +      } else {
    +        Int.MaxValue
    +      }
    +
    +      if (maxConTasks <= 0) {
    +        throw new IllegalArgumentException(
    +          "Maximum Concurrent Tasks should be set greater than 0 for the 
job to progress.")
    +      }
    +
    +      if (jobGroupId == null || 
!conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        jobGroupId = DEFAULT_JOB_GROUP
    +      }
    +
    +      jobIdToJobGroup(jobStart.jobId) = jobGroupId
    +      if (!jobGroupToMaxConTasks.contains(jobGroupId)) {
    --- End diff --
    
     If we are talking jobs within the same job groups, it seems like this 
would be very timing dependent as to what number you would get if you start 
allowing it to be changed real time.  Lets say you have 1 thread and set the 
job group.  Now if all the jobs within that group are launched serial then 
everything is easy, allowing it to be changed can make sense. But if from that 
thread you spawn other threads to launch jobs in parallel (which would still be 
in that same job group) and each of those is setting it differently, how do you 
know you will get the right number for each of those jobs?   the 2 threads 
could race to set the conf and if both set it right before launching you are 
going to get one of the settings for both launches whereas one might have 
expected a different setting.
    
    @squito  does this cover the scenario you are referring to?  
    
    while both of those cases might be rare, I would lean towards making sure 
its more predictable and only setting it once rather then having user get 
something they don't expect.  But either could probably be documented away if 
we see the serial type scenario being more beneficial.  
    
    ideally it would be nice to set at the stage level but that is a lot more 
difficult.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to