[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli closed the pull request at: https://github.com/apache/spark/pull/4703 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-85319636 Hey @ilganeli would you mind closing this one since reviews are going on in the newer one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-77930465 Hi @srowen and @markhamstra , with the exception of the refactoring for newResultStage that Mark proposed, I've added the recommended changes. Is there anything else that this needs? Would appreciate the feedback. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-77263660 Hi all - I had to revert to my initial implementation since Mark's suggested refactoring introduced a test failure. Is this good to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-77267879 Interesting. Looks like the failed test results are no longer available. Do you recall what the problem was? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-77272127 It was a pretty obscure error. I could revert and give you the stack trace but I played around with it a bit and wasn't able to trace it down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75829212 Hi Sean - I'll give the retest a shot. If there's a way to get whitelisted that would be amazing. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75838380 [Test build #27900 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27900/consoleFull) for PR 4708 at commit [`c9288e2`](https://github.com/apache/spark/commit/c9288e2d91ded0efcb6280b18fffae84a9abfba6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75828243 [Test build #27900 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27900/consoleFull) for PR 4708 at commit [`c9288e2`](https://github.com/apache/spark/commit/c9288e2d91ded0efcb6280b18fffae84a9abfba6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75838391 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27900/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75824748 Retest please :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75827924 Looks like a transient git failure yes. I wonder if you have the power to tell Jenkins to retest? retest this please seems to work for me. If not maybe we can get you whitelisted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75855424 [Test build #27911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27911/consoleFull) for PR 4708 at commit [`c9288e2`](https://github.com/apache/spark/commit/c9288e2d91ded0efcb6280b18fffae84a9abfba6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75865162 [Test build #27912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27912/consoleFull) for PR 4708 at commit [`1b3471b`](https://github.com/apache/spark/commit/1b3471b110d5c8b499c49185560f0f8788850996). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75856293 [Test build #27912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27912/consoleFull) for PR 4708 at commit [`1b3471b`](https://github.com/apache/spark/commit/1b3471b110d5c8b499c49185560f0f8788850996). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75865176 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27912/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75881056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27917/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75881048 [Test build #27917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27917/consoleFull) for PR 4708 at commit [`6b43d7b`](https://github.com/apache/spark/commit/6b43d7b4e94025fdba6263d2627a18cb47414e7d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75864145 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27911/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75864128 [Test build #27911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27911/consoleFull) for PR 4708 at commit [`c9288e2`](https://github.com/apache/spark/commit/c9288e2d91ded0efcb6280b18fffae84a9abfba6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75855242 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75868482 [Test build #27917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27917/consoleFull) for PR 4708 at commit [`6b43d7b`](https://github.com/apache/spark/commit/6b43d7b4e94025fdba6263d2627a18cb47414e7d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75593300 [Test build #27854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27854/consoleFull) for PR 4708 at commit [`b85c5fe`](https://github.com/apache/spark/commit/b85c5fe14fdece4769fc98bbedcba80252b325bf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25183148 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -830,39 +836,39 @@ class DAGScheduler( try { // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep). // For ResultTask, serialize and broadcast (rdd, func). - val taskBinaryBytes: Array[Byte] = -if (stage.isShuffleMap) { - closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array() -} else { - closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array() -} + val taskBinaryBytes: Array[Byte] = stage match { +case a: ShuffleMapStage = + closureSerializer.serialize((a.rdd, a.shuffleDep): AnyRef).array() +case b: ResultStage = + closureSerializer.serialize((b.rdd, b.resultOfJob.get.func): AnyRef).array() + } + taskBinary = sc.broadcast(taskBinaryBytes) } catch { // In the case of a failure during serialization, abort the stage. case e: NotSerializableException = abortStage(stage, Task not serializable: + e.toString) runningStages -= stage -return --- End diff -- This was a mistake introduced when I was doing the second round of refactoring (copying code back from when I pulled this all out to its own method). When this code is within its own method then we can just look at the return value of the method and the weird return breaks become unnecessary. I'll add a comment for these in the meantime. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75637201 Look pretty good to me, but left a few more comments. Also, please take a look at the various logging strings to see whether some of them can be expressed more readably using string interpolation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25202931 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -210,40 +210,58 @@ class DAGScheduler( * The jobId value passed in will be used if the stage doesn't already exist with * a lower jobId (jobId always increases across jobs.) */ - private def getShuffleMapStage(shuffleDep: ShuffleDependency[_, _, _], jobId: Int): Stage = { + private def getShuffleMapStage( + shuffleDep: ShuffleDependency[_, _, _], + jobId: Int): ShuffleMapStage = { shuffleToMapStage.get(shuffleDep.shuffleId) match { case Some(stage) = stage case None = // We are going to register ancestor shuffle dependencies registerShuffleDependencies(shuffleDep, jobId) // Then register current shuffleDep -val stage = - newOrUsedStage( -shuffleDep.rdd, shuffleDep.rdd.partitions.size, shuffleDep, jobId, -shuffleDep.rdd.creationSite) +val stage = newOrUsedShuffleStage(shuffleDep, jobId) shuffleToMapStage(shuffleDep.shuffleId) = stage - + stage } } /** - * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation - * of a shuffle map stage in newOrUsedStage. The stage will be associated with the provided - * jobId. Production of shuffle map stages should always use newOrUsedStage, not newStage - * directly. + * Create a ShuffleMapStage as part of the (re)-creation of a shuffle map stage in + * newOrUsedShuffleStage. The stage will be associated with the provide jobId. + * Production of shuffle map stages should always use newOrUsedShuffleStage,not + * newShuffleMapStage directly. */ - private def newStage( + private def newShuffleMapStage( rdd: RDD[_], numTasks: Int, - shuffleDep: Option[ShuffleDependency[_, _, _]], + shuffleDep: ShuffleDependency[_, _, _], jobId: Int, - callSite: CallSite) -: Stage = - { + callSite: CallSite): ShuffleMapStage = { val parentStages = getParentStages(rdd, jobId) val id = nextStageId.getAndIncrement() -val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite) +val stage: ShuffleMapStage = new ShuffleMapStage(id, rdd, numTasks, parentStages, + jobId, callSite, shuffleDep) + +stageIdToStage(id) = stage +updateJobIdStageIdMaps(jobId, stage) +stage + } + + /** + * Create a ResultStage -- either directly for use as a result stage, or as part of the + * (re)-creation of a shuffle map stage in newOrUsedShuffleStage. The stage will be associated + * with the provided jobId. + */ + private def newResultStage( + rdd: RDD[_], + numTasks: Int, + jobId: Int, + callSite: CallSite): ResultStage = { +val parentStages = getParentStages(rdd, jobId) +val id = nextStageId.getAndIncrement() +val stage: ResultStage = new ResultStage(id, rdd, numTasks, parentStages, jobId, callSite) + --- End diff -- I'd rather avoid the code duplication in newShuffleMapStage and newResultStage. This can be done in generic fashion via runtime reflection: ```scala import scala.reflect.runtime.{universe = ru} ... private def newStage[T : Stage: ru.TypeTag]( rdd: RDD[_], numTasks: Int, shuffleDep: Option[ShuffleDependency[_, _, _]], jobId: Int, callSite: CallSite): T = { val m = ru.runtimeMirror(getClass.getClassLoader) val classT = ru.typeOf[T].typeSymbol.asClass val cm = m.reflectClass(classT) val ctor = ru.typeOf[T].declaration(ru.nme.CONSTRUCTOR).asMethod val ctorm = cm.reflectConstructor(ctor) val parentStages = getParentStages(rdd, jobId) val id = nextStageId.getAndIncrement() val stage = shuffleDep.map { shufDep = ctorm(id, rdd, numTasks, parentStages, jobId, callSite, shufDep) }.getOrElse(ctorm(id, rdd, numTasks, parentStages, jobId, callSite)).asInstanceOf[T] stageIdToStage(id) = stage updateJobIdStageIdMaps(jobId, stage) stage } ... val stage = newStage[ShuffleMapStage](rdd, numTasks, Some(shuffleDep), jobId, rdd.creationSite) ... finalStage = newStage[ResultStage](finalRDD, partitions.size, None, jobId, callSite) ``` ...but I'd want to see the performance numbers on that before deciding not to go with a less flexible approach that avoids reflection: ```scala private def newStage[T : Stage](
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25186634 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -77,52 +71,9 @@ private[spark] class Stage( /** Pointer to the latest [StageInfo] object, set by DAGScheduler. */ var latestInfo: StageInfo = StageInfo.fromStage(this) - def isAvailable: Boolean = { -if (!isShuffleMap) { - true -} else { - numAvailableOutputs == numPartitions -} - } - - def addOutputLoc(partition: Int, status: MapStatus) { -val prevList = outputLocs(partition) -outputLocs(partition) = status :: prevList -if (prevList == Nil) { - numAvailableOutputs += 1 -} - } - - def removeOutputLoc(partition: Int, bmAddress: BlockManagerId) { -val prevList = outputLocs(partition) -val newList = prevList.filterNot(_.location == bmAddress) -outputLocs(partition) = newList -if (prevList != Nil newList == Nil) { - numAvailableOutputs -= 1 -} - } + var numAvailableOutputs = 0 --- End diff -- ...for nextAttemptId, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75604216 [Test build #27858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27858/consoleFull) for PR 4708 at commit [`d548caf`](https://github.com/apache/spark/commit/d548cafab4b6f36ee7e9bed696419567f0bc3d94). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75609741 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75610748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25186416 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -47,26 +47,20 @@ import org.apache.spark.util.CallSite * be updated for each attempt. --- End diff -- Remove unused BlockManagerId from imports --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75594276 [Test build #27856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27856/consoleFull) for PR 4708 at commit [`6da3a71`](https://github.com/apache/spark/commit/6da3a7101c3c8087a9a924b998889eb6e1b3446f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25186558 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -77,52 +71,9 @@ private[spark] class Stage( /** Pointer to the latest [StageInfo] object, set by DAGScheduler. */ var latestInfo: StageInfo = StageInfo.fromStage(this) - def isAvailable: Boolean = { -if (!isShuffleMap) { - true -} else { - numAvailableOutputs == numPartitions -} - } - - def addOutputLoc(partition: Int, status: MapStatus) { -val prevList = outputLocs(partition) -outputLocs(partition) = status :: prevList -if (prevList == Nil) { - numAvailableOutputs += 1 -} - } - - def removeOutputLoc(partition: Int, bmAddress: BlockManagerId) { -val prevList = outputLocs(partition) -val newList = prevList.filterNot(_.location == bmAddress) -outputLocs(partition) = newList -if (prevList != Nil newList == Nil) { - numAvailableOutputs -= 1 -} - } + var numAvailableOutputs = 0 --- End diff -- Add explicit type declaration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25188257 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -228,22 +227,41 @@ class DAGScheduler( } /** - * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation - * of a shuffle map stage in newOrUsedStage. The stage will be associated with the provided - * jobId. Production of shuffle map stages should always use newOrUsedStage, not newStage - * directly. + * Create a ShuffleMapStage as part of the (re)-creation of a shuffle map stage in + * newOrUsedShuffleStage. The stage will be associated with the provided + * jobId. Production of shuffle map stages should always use newOrUsedShuffleStage, + * not newShuffleMapStage directly. --- End diff -- nit: reformat a little... ```scala /** * Create a ShuffleMapStage as part of the (re)-creation of a shuffle map stage in * newOrUsedShuffleStage. The stage will be associated with the provided jobId. * Production of shuffle map stages should always use newOrUsedShuffleStage, not * newShuffleMapStage directly. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75609726 [Test build #27854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27854/consoleFull) for PR 4708 at commit [`b85c5fe`](https://github.com/apache/spark/commit/b85c5fe14fdece4769fc98bbedcba80252b325bf). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75610731 [Test build #27856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27856/consoleFull) for PR 4708 at commit [`6da3a71`](https://github.com/apache/spark/commit/6da3a7101c3c8087a9a924b998889eb6e1b3446f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75602215 @JoshRosen I'm happy to take a look at this but won't be able to get to it until Friday. Feel free to merge it sooner than that if you're eager to get it in; otherwise I'll take a look Friday! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75620128 [Test build #27858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27858/consoleFull) for PR 4708 at commit [`d548caf`](https://github.com/apache/spark/commit/d548cafab4b6f36ee7e9bed696419567f0bc3d94). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75620137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75653958 nits: extra spaces in type declarations. E.g,: ```scala var numAvailableOutputs: Long = 0 ``` not ```scala var numAvailableOutputs : Long = 0 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75681861 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27871/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75691936 What happened with this test? Seems like it failed to fetch from git? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75492348 Hmm, it looks like MiMa is complaining about the `Stage` class becoming abstract even though it's `private[spark]`: ``` [info] spark-core: found 1 potential binary incompatibilities (filtered 352) [error] * class org.apache.spark.scheduler.Stage was concrete; is declared abstract in new version [error]filter with: ProblemFilters.exclude[AbstractClassProblem](org.apache.spark.scheduler.Stage) ``` We can fix this by adding an exclude. When you do this, make sure to add a comment referencing this JIRA so we know why it was introduced. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75493744 I've added an exclude for the Spark 1.3 build - hopefully that resolves this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145518 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -830,39 +836,39 @@ class DAGScheduler( try { // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep). // For ResultTask, serialize and broadcast (rdd, func). - val taskBinaryBytes: Array[Byte] = -if (stage.isShuffleMap) { - closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array() -} else { - closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array() -} + val taskBinaryBytes: Array[Byte] = stage match { +case a: ShuffleMapStage = + closureSerializer.serialize((a.rdd, a.shuffleDep): AnyRef).array() +case b: ResultStage = + closureSerializer.serialize((b.rdd, b.resultOfJob.get.func): AnyRef).array() + } + taskBinary = sc.broadcast(taskBinaryBytes) } catch { // In the case of a failure during serialization, abort the stage. case e: NotSerializableException = abortStage(stage, Task not serializable: + e.toString) runningStages -= stage -return case NonFatal(e) = abortStage(stage, sTask serialization failed: $e\n${e.getStackTraceString}) runningStages -= stage -return } -val tasks: Seq[Task[_]] = if (stage.isShuffleMap) { - partitionsToCompute.map { id = -val locs = getPreferredLocs(stage.rdd, id) -val part = stage.rdd.partitions(id) -new ShuffleMapTask(stage.id, taskBinary, part, locs) - } -} else { - val job = stage.resultOfJob.get - partitionsToCompute.map { id = -val p: Int = job.partitions(id) -val part = stage.rdd.partitions(p) -val locs = getPreferredLocs(stage.rdd, p) -new ResultTask(stage.id, taskBinary, part, locs, id) - } +val tasks: Seq[Task[_]] = stage match { + case a: ShuffleMapStage = --- End diff -- It's legal to have a case statement where multiple branches use the same variable name (because there's no fall-through): ```scala scala def foo(x: Any) = x match { | case y: String = y | case y: Int = y * 2 | } foo: (x: Any)Any scala foo(1) res1: Any = 2 scala foo(hello) res2: Any = hello ``` Therefore, I think it would be clearer to replace `a` and `b` with `stage` (this will also reduce the size of the diff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145581 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -996,50 +1006,52 @@ class DAGScheduler( if (failedEpoch.contains(execId) smt.epoch = failedEpoch(execId)) { logInfo(Ignoring possibly bogus ShuffleMapTask completion from + execId) } else { - stage.addOutputLoc(smt.partitionId, status) + shuffleStage.addOutputLoc(smt.partitionId, status) } -if (runningStages.contains(stage) stage.pendingTasks.isEmpty) { - markStageAsFinished(stage) - logInfo(looking for newly runnable stages) +if (runningStages.contains(shuffleStage) shuffleStage.pendingTasks.isEmpty) { + markStageAsFinished(shuffleStage) + logInfo(looking for newly runnable shuffleStages) --- End diff -- I think that this log message should remain unchanged, since technically I think we'll consider scheduling of all runnable stages, which might also include result stages --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145702 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultStage.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.rdd.RDD +import org.apache.spark.util.CallSite + +/** + * Define a class that represents the ResultStage to help clean up the DAGScheduler class --- End diff -- Same here; I don't think this comment is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145735 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultStage.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.rdd.RDD +import org.apache.spark.util.CallSite + +/** + * Define a class that represents the ResultStage to help clean up the DAGScheduler class + */ +private[spark] class ResultStage( +override val id: Int, +override val rdd: RDD[_], +override val numTasks: Int, +override val parents: List[Stage], +override val jobId: Int, +override val callSite: CallSite) + extends Stage(id, rdd, numTasks, parents, jobId, callSite) { + + /** For stages that are the final (consists of only ResultTasks), link to the ActiveJob. */ --- End diff -- Now that this field is only present in the new ResultStage class, maybe we could update this comment to say something like Links to the active job for this result stage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75493784 [Test build #27850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27850/consoleFull) for PR 4708 at commit [`a57dfcd`](https://github.com/apache/spark/commit/a57dfcd919342e9b06cddc0d7ac7e576cdcdafcd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145685 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.rdd.RDD +import org.apache.spark.ShuffleDependency +import org.apache.spark.storage.BlockManagerId +import org.apache.spark.util.CallSite + +/** + * Define a class that represents the ShuffleMapStage to help clean up the DAGScheduler class --- End diff -- I don't think this description will make much sense after this patch is merged, since by that point the cleanup will have already been performed! I think we can just omit this comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4708#discussion_r25145446 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -830,39 +836,39 @@ class DAGScheduler( try { // For ShuffleMapTask, serialize and broadcast (rdd, shuffleDep). // For ResultTask, serialize and broadcast (rdd, func). - val taskBinaryBytes: Array[Byte] = -if (stage.isShuffleMap) { - closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array() -} else { - closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array() -} + val taskBinaryBytes: Array[Byte] = stage match { +case a: ShuffleMapStage = + closureSerializer.serialize((a.rdd, a.shuffleDep): AnyRef).array() +case b: ResultStage = + closureSerializer.serialize((b.rdd, b.resultOfJob.get.func): AnyRef).array() + } + taskBinary = sc.broadcast(taskBinaryBytes) } catch { // In the case of a failure during serialization, abort the stage. case e: NotSerializableException = abortStage(stage, Task not serializable: + e.toString) runningStages -= stage -return --- End diff -- I don't think that we can remove this `return` (or the one line 849). If we remove this, then we'll hit errors later on because `taskBinary` will still be null due to the serialization error. I think we should add these return statements back and possibly add a comment to each occurrence explaining why it's needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75494807 I took a quick pass, but overall this looks pretty good to me since it's inline with the refactoring that I had originally planned to do myself. It's nice to see a lot of the weird `isShuffleMap` stuff removed. I also like how this refactoring clarifies that ShuffleMapTasks always belong to ShuffleMapStages, etc. Maybe @markhamstra or @kayousterhout would like to take a look at this? This looks like a pretty straightforward, low-risk refactoring to me, but it would be good to get more eyes on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75499003 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27850/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75499000 [Test build #27850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27850/consoleFull) for PR 4708 at commit [`a57dfcd`](https://github.com/apache/spark/commit/a57dfcd919342e9b06cddc0d7ac7e576cdcdafcd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25058069 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -484,7 +505,7 @@ class DAGScheduler( Total number of partitions: + maxPartitions) } -val jobId = nextJobId.getAndIncrement() +val jobId = nextJobId.getAndIncrement --- End diff -- mutator should include the `()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25055713 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -47,26 +47,19 @@ import org.apache.spark.util.CallSite * be updated for each attempt. * */ -private[spark] class Stage( -val id: Int, -val rdd: RDD[_], -val numTasks: Int, -val shuffleDep: Option[ShuffleDependency[_, _, _]], // Output shuffle if stage is a map stage -val parents: List[Stage], -val jobId: Int, -val callSite: CallSite) +private[spark] abstract class Stage(val id: Int, +val rdd: RDD[_], +val numTasks: Int, +val parents: List[Stage], +val jobId: Int, +val callSite: CallSite) --- End diff -- Follow the prior indentation style, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25056356 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -132,9 +88,7 @@ private[spark] class Stage( } def attemptId: Int = nextAttemptId - - override def toString = Stage + id - + override def hashCode(): Int = id override def equals(other: Any): Boolean = other match { --- End diff -- `hashCode` and `equals` should probably be `final`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25057714 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -229,41 +227,56 @@ class DAGScheduler( /** * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation - * of a shuffle map stage in newOrUsedStage. The stage will be associated with the provided - * jobId. Production of shuffle map stages should always use newOrUsedStage, not newStage - * directly. + * of a shuffle map stage in newOrUsedShuffleStage. The stage will be associated with the provided + * jobId. Production of shuffle map stages should always use newOrUsedShuffleStage, + * not getAndRegNewShuffleMap directly. */ - private def newStage( - rdd: RDD[_], - numTasks: Int, - shuffleDep: Option[ShuffleDependency[_, _, _]], - jobId: Int, - callSite: CallSite) -: Stage = - { + private def getAndRegNewShuffleMap(rdd: RDD[_], + numTasks: Int, + shuffleDep: ShuffleDependency[_, _, _], + jobId: Int, + callSite: CallSite): ShuffleMapStage = { val parentStages = getParentStages(rdd, jobId) -val id = nextStageId.getAndIncrement() -val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite) +val id = nextStageId.getAndIncrement +val stage: ShuffleMapStage = new ShuffleMapStage(id, rdd, numTasks, parentStages, + jobId, callSite, shuffleDep) + stageIdToStage(id) = stage updateJobIdStageIdMaps(jobId, stage) stage } /** + * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation + * of a shuffle map stage in newOrUsedShuffleStage. The stage will be associated with the provided + * jobId. Production of shuffle map stages should always use newOrUsedShuffleStage, + * not getAndRegNewResults directly. + */ + private def getAndRegNewResults(rdd: RDD[_], --- End diff -- I'm not a fan of the `getAndReg` addition since `get` falsely implies some kind of look up, and `reg` is more-or-less an implementation detail that across future refactorings won't necessarily be a fixed part of the creation of new stages. The new, longer naming just doesn't add anything positive, imo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25057992 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -306,26 +319,31 @@ class DAGScheduler( } } waitingForVisit.push(rdd) -while (!waitingForVisit.isEmpty) { +while (waitingForVisit.nonEmpty) { visit(waitingForVisit.pop()) } parents.toList } - - // Find ancestor missing shuffle dependencies and register into shuffleToMapStage + + /** + * Find ancestor missing shuffle dependencies and register into shuffleToMapStage + * @param shuffleDep + * @param jobId --- End diff -- Placeholder doc annotations aren't helpful for `private` methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25056104 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Stage.scala --- @@ -77,53 +70,16 @@ private[spark] class Stage( /** Pointer to the latest [StageInfo] object, set by DAGScheduler. */ var latestInfo: StageInfo = StageInfo.fromStage(this) + var numAvailableOutputs = 0 + def isAvailable: Boolean = { -if (!isShuffleMap) { +if (!this.isInstanceOf[ShuffleMapStage]) { true } else { numAvailableOutputs == numPartitions } } --- End diff -- Why not leave this abstract with just one branch of the if in each override? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25056762 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -229,41 +227,56 @@ class DAGScheduler( /** * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation - * of a shuffle map stage in newOrUsedStage. The stage will be associated with the provided - * jobId. Production of shuffle map stages should always use newOrUsedStage, not newStage - * directly. + * of a shuffle map stage in newOrUsedShuffleStage. The stage will be associated with the provided + * jobId. Production of shuffle map stages should always use newOrUsedShuffleStage, + * not getAndRegNewShuffleMap directly. */ - private def newStage( - rdd: RDD[_], - numTasks: Int, - shuffleDep: Option[ShuffleDependency[_, _, _]], - jobId: Int, - callSite: CallSite) -: Stage = - { + private def getAndRegNewShuffleMap(rdd: RDD[_], + numTasks: Int, + shuffleDep: ShuffleDependency[_, _, _], + jobId: Int, + callSite: CallSite): ShuffleMapStage = { val parentStages = getParentStages(rdd, jobId) -val id = nextStageId.getAndIncrement() -val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite) +val id = nextStageId.getAndIncrement +val stage: ShuffleMapStage = new ShuffleMapStage(id, rdd, numTasks, parentStages, + jobId, callSite, shuffleDep) + stageIdToStage(id) = stage updateJobIdStageIdMaps(jobId, stage) stage } /** + * Create a Stage -- either directly for use as a result stage, or as part of the (re)-creation + * of a shuffle map stage in newOrUsedShuffleStage. The stage will be associated with the provided + * jobId. Production of shuffle map stages should always use newOrUsedShuffleStage, + * not getAndRegNewResults directly. + */ + private def getAndRegNewResults(rdd: RDD[_], + numTasks: Int, + jobId: Int, + callSite: CallSite): ResultStage = { --- End diff -- Indentation of this one's even worse than the others. Please fix all your `def`s to adhere to the Style Guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/4703#discussion_r25058302 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -912,6 +959,196 @@ class DAGScheduler( } /** + * Helper function to handle stage completion + * @param stage + * @param errorMessage + * @return + */ + private def markStageAsFinished(stage: Stage, errorMessage: Option[String] = None) = { +val serviceTime = stage.latestInfo.submissionTime match { + case Some(t) = %.03f.format((clock.getTime() - t) / 1000.0) + case _ = Unknown +} +if (errorMessage.isEmpty) { + logInfo(%s (%s) finished in %s s.format(stage, stage.name, serviceTime)) + stage.latestInfo.completionTime = Some(clock.getTime()) +} else { + stage.latestInfo.stageFailed(errorMessage.get) + logInfo(%s (%s) failed in %s s.format(stage, stage.name, serviceTime)) +} +listenerBus.post(SparkListenerStageCompleted(stage.latestInfo)) +runningStages -= stage + } + + /** + * Helper functino to process a ResultTask completion --- End diff -- typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75332520 [Test build #27793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27793/consoleFull) for PR 4708 at commit [`96dd161`](https://github.com/apache/spark/commit/96dd161b604dd999448172047519e201e25ed28b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-7531 [Test build #27788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27788/consoleFull) for PR 4703 at commit [`8ed51d8`](https://github.com/apache/spark/commit/8ed51d81513b36289bbd76164b6a705ef1889ba3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75320629 Josh, what I'll do is create a new branch and port the changes individually and leave this branch intact as a reference. I'll submit a separate PR for that work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75318899 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27788/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
GitHub user ilganeli opened a pull request: https://github.com/apache/spark/pull/4708 [SPARK-4655] Split Stage into ShuffleMapStage and ResultStage subclasses Hi all - this patch includes splitting up Stage into ShuffleMapStage and ResultStage and updating their usage within DAGScheduler. I wanted to confirm that it's appropriate to move the outputLocs variable and its associated methods into ShuffleMapStage. Within the shuffleMapTask handler (line 1004) I wanted to confirm that we are guaranteed for that stage to be a ShuffleMapStage I believe I've split up the functionality of ResultStage and ShuffleMapStage appropriately but I wanted to make sure I'm not missing something. Lastly, I suggest opening a JIRA to further improve readability of two functions within the DAGScheduler: 1) The monolithic handleTaskCompletion 2) The cleanupStateForJobAndIndependentStages function This last item is already code-complete in a separate PR but it should really be done within its own issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/spark SPARK-4655 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4708 commit 83494e94ff45e0f275966d405d777d22cb167800 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-20T21:22:18Z Added new Stage classes commit cfd6f10d4f30d1940c7dbc61b58af22cfd647980 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-20T21:54:29Z Updated DAGScheduler to use new ResultStage and ShuffleMapStage classes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75327500 [Test build #27792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27792/consoleFull) for PR 4708 at commit [`cfd6f10`](https://github.com/apache/spark/commit/cfd6f10d4f30d1940c7dbc61b58af22cfd647980). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75327654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27792/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75341191 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27793/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75353758 [Test build #27805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27805/consoleFull) for PR 4708 at commit [`83ed849`](https://github.com/apache/spark/commit/83ed849dbc063d8a6ec3b763932792b092a5a2d3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75343825 Is this another of the spurious test failure? Seems to be some bizarre thing within ML Lib. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75355985 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27805/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75355984 [Test build #27805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27805/consoleFull) for PR 4708 at commit [`83ed849`](https://github.com/apache/spark/commit/83ed849dbc063d8a6ec3b763932792b092a5a2d3). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75341185 [Test build #27793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27793/consoleFull) for PR 4708 at commit [`96dd161`](https://github.com/apache/spark/commit/96dd161b604dd999448172047519e201e25ed28b). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75284304 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27780/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75284266 [Test build #27780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27780/consoleFull) for PR 4703 at commit [`09a3437`](https://github.com/apache/spark/commit/09a343746602ec8cb6630de9d3308513ff73eea0). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75284298 [Test build #27780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27780/consoleFull) for PR 4703 at commit [`09a3437`](https://github.com/apache/spark/commit/09a343746602ec8cb6630de9d3308513ff73eea0). * This patch **fails RAT tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75327639 I've submitted a new PR here: https://github.com/apache/spark/pull/4708 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4708#issuecomment-75327652 [Test build #27792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27792/consoleFull) for PR 4708 at commit [`cfd6f10`](https://github.com/apache/spark/commit/cfd6f10d4f30d1940c7dbc61b58af22cfd647980). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75286796 [Test build #27782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27782/consoleFull) for PR 4703 at commit [`6266165`](https://github.com/apache/spark/commit/6266165ee44587621cde059e21648b0c4a9f2dfd). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75286815 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27782/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75288560 [Test build #27783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27783/consoleFull) for PR 4703 at commit [`0a4109e`](https://github.com/apache/spark/commit/0a4109ecb1982130e7f561e8f72d47c6fd0cd9e0). * This patch **fails RAT tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75288542 [Test build #27783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27783/consoleFull) for PR 4703 at commit [`0a4109e`](https://github.com/apache/spark/commit/0a4109ecb1982130e7f561e8f72d47c6fd0cd9e0). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75286811 [Test build #27782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27782/consoleFull) for PR 4703 at commit [`6266165`](https://github.com/apache/spark/commit/6266165ee44587621cde059e21648b0c4a9f2dfd). * This patch **fails RAT tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75288566 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27783/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75304156 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27787/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75304153 [Test build #27787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27787/consoleFull) for PR 4703 at commit [`fe4e137`](https://github.com/apache/spark/commit/fe4e1378f3a56b2d633a5beabe324eede24c43dd). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75304143 [Test build #27787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27787/consoleFull) for PR 4703 at commit [`fe4e137`](https://github.com/apache/spark/commit/fe4e1378f3a56b2d633a5beabe324eede24c43dd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75306668 [Test build #27788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27788/consoleFull) for PR 4703 at commit [`8ed51d8`](https://github.com/apache/spark/commit/8ed51d81513b36289bbd76164b6a705ef1889ba3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75307867 What do you think about performing only the Stage split in this PR and deferring the other code movement to a separate one? That would make this much easier to review, since I'm pretty sure that the Stage split itself should involve comparatively few changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75308162 Hi @JoshRosen, I'm down for making your life easier. Would you still be interested in the other refactoring that I did as part of this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75311331 Hi @ilganeli, Those other refactoring might be welcome, since the some of the mega-functions in DAGScheduler can be pretty hard to read. Let's do those separately, though. My conservatism here stems from the fact that DAGScheduler is a giant piece of complex code that only a handful of people understand well. Given that there might be objections to performing _any_ unnecessary refactoring due to the potential to introduce bugs, I think it's best to work very gradually and perform many small, separate sets of changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
GitHub user ilganeli opened a pull request: https://github.com/apache/spark/pull/4703 [SPARK-4655] Split Stage into ShuffleMapStage and ResultStage subclasses Hi all - this patch includes two main efforts: 1) I've split up Stage into ShuffleMapStage and ResultStage and updated their usage within DAGScheduler 2) While doing this, I ended up cleaning up a lot of the long and awkward functions within the DAGScheduler since I needed to do that to make sense of the code. I believe this improved readability of the code significantly - the only problem is that it's not straightforward to diff those changes side by side (particularly the updates to handleFetchFailure(). I wanted to confirm that it's appropriate to move the outputLocs variable and its associated methods into ShuffleMapStage. The one function of concern is within the completeStage() and handleFailedTasks() functions where I wanted to confirm that we are guaranteed for that stage to be a ShuffleMapStage I believe I've split up the functionality of ResultStage and ShuffleMapStage appropriately but I wanted to make sure I'm not missing something. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ilganeli/spark SPARK-4653 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4703 commit 424fdef9578b4450ec75c8d84db98927e4b15831 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-18T01:19:20Z Began refactoring Stage.scala commit 562b688d07a9404196c6856b5f3302c31b05bb2d Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T00:54:36Z Significant refactoring within DAG Scheduler class commit 75fe74f616adfa4e2da7a439fc86f784265aae84 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T00:58:34Z Merge remote-tracking branch 'upstream/master' into SPARK-4653 commit 5e76fb8b21ba877afe4a9255036b8120691aeda8 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T01:06:23Z Small updates commit cb66a4fdf23883c24a7684b6f20ca532c74241a7 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T18:08:14Z Further refactoring. Moved outputLocs and associated functions to inside ShuffleMapStage. commit e3e7ea23584d3fb442ede54231aa048815a8f436 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T18:11:15Z Minor formatting commit 149b9a13e48371a1ea06a3fc85374fa026368cd8 Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T22:06:21Z Refactored more extremely large methods to be more modular. Cleaned up some usages of vs nonEmpty. commit be3e01fad73a3d1124ad090acbef6f38379afc4e Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-19T22:58:56Z Minor fixes commit 00182c7e7f9a38d5eee86065171804acc52e6d2a Author: Ilya Ganelin ilya.gane...@capitalone.com Date: 2015-02-20T07:43:59Z Cleaned up hanleTaskCompletion more --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75201638 [Test build #27771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27771/consoleFull) for PR 4703 at commit [`00182c7`](https://github.com/apache/spark/commit/00182c7e7f9a38d5eee86065171804acc52e6d2a). * This patch **fails RAT tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75201642 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27771/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4655] Split Stage into ShuffleMapStage ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4703#issuecomment-75201629 [Test build #27771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27771/consoleFull) for PR 4703 at commit [`00182c7`](https://github.com/apache/spark/commit/00182c7e7f9a38d5eee86065171804acc52e6d2a). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org