[GitHub] spark pull request #18950: [SPARK-20589][Core][Scheduler] Allow limiting tas...

tgravescs Tue, 22 Aug 2017 13:54:54 -0700

Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18950#discussion_r134599045
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -598,13 +600,58 @@ private[spark] class ExecutorAllocationManager(
         private val executorIdToTaskIds = new mutable.HashMap[String, 
mutable.HashSet[Long]]
         // Number of tasks currently running on the cluster.  Should be 0 when 
no stages are active.
         private var numRunningTasks: Int = _
    +    private val jobGroupToMaxConTasks = new mutable.HashMap[String, Int]
    +    private val jobIdToJobGroup = new mutable.HashMap[Int, String]
    +    private val stageIdToJobId = new mutable.HashMap[Int, Int]
    +    private val stageIdToCompleteTaskCount = new mutable.HashMap[Int, Int]
     
         // stageId to tuple (the number of task with locality preferences, a 
map where each pair is a
         // node and the number of tasks that would like to be scheduled on 
that node) map,
         // maintain the executor placement hints for each stage Id used by 
resource framework to better
         // place the executors.
         private val stageIdToExecutorPlacementHints = new mutable.HashMap[Int, 
(Int, Map[String, Int])]
     
    +    override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
    +      jobStart.stageInfos.foreach(stageInfo => 
stageIdToJobId(stageInfo.stageId) = jobStart.jobId)
    +
    +      var jobGroupId = if (jobStart.properties != null) {
    +        jobStart.properties.getProperty(SparkContext.SPARK_JOB_GROUP_ID)
    +      } else {
    +        null
    +      }
    +
    +      val maxConTasks = if (jobGroupId != null &&
    +        conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        conf.get(s"spark.job.$jobGroupId.maxConcurrentTasks").toInt
    +      } else {
    +        Int.MaxValue
    +      }
    +
    +      if (maxConTasks <= 0) {
    +        throw new IllegalArgumentException(
    +          "Maximum Concurrent Tasks should be set greater than 0 for the 
job to progress.")
    +      }
    +
    +      if (jobGroupId == null || 
!conf.contains(s"spark.job.$jobGroupId.maxConcurrentTasks")) {
    +        jobGroupId = "default-group-" + jobStart.jobId.hashCode
    --- End diff --
    
    Actually I think we could just use 1 job group id for all the the jobs that 
don't have a group specified.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18950: [SPARK-20589][Core][Scheduler] Allow limiting tas...

Reply via email to