Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6291#discussion_r44177425
  
    --- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
    @@ -1275,6 +1278,104 @@ class DAGSchedulerSuite
         assertDataStructuresEmpty()
       }
     
    +  def checkJobProperties(taskSet: TaskSet, expected: String): Unit = {
    +    assert(taskSet.properties != null)
    +    assert(taskSet.properties.getProperty("testProperty") === expected)
    +  }
    +
    +  def launchJobsThatShareStageAndCancelFirst(): ShuffleDependency[Int, 
Int, Nothing] = {
    +    val baseRdd = new MyRDD(sc, 1, Nil)
    +    val shuffleDep1 = new ShuffleDependency(baseRdd, null)
    +    val intermediateRdd = new MyRDD(sc, 1, List(shuffleDep1))
    +    val shuffleDep2 = new ShuffleDependency(intermediateRdd, null)
    +    val finalRdd1 = new MyRDD(sc, 1, List(shuffleDep2))
    +    val finalRdd2 = new MyRDD(sc, 1, List(shuffleDep2))
    +    val job1Properties = new Properties()
    +    val job2Properties = new Properties()
    +    job1Properties.setProperty("testProperty", "job1")
    +    job2Properties.setProperty("testProperty", "job2")
    +
    +    // Run jobs 1 & 2, both referencing the same stage, then cancel job1.
    +    // Note that we have to submit job2 before we cancel job1 to have them 
actually share
    +    // *Stages*, and not just shuffle dependencies, due to skipped stages 
(at least until
    +    // we address SPARK-10193.)
    +    val jobId1 = submit(finalRdd1, Array(0), properties = job1Properties)
    +    val jobId2 = submit(finalRdd2, Array(0), properties = job2Properties)
    +    assert(scheduler.activeJobs.nonEmpty)
    +    val testProperty1 = 
scheduler.jobIdToActiveJob(jobId1).properties.getProperty("testProperty")
    +
    +    // remove job1 as an ActiveJob
    +    cancel(jobId1)
    +    sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS)
    --- End diff --
    
    mostly for my own understanding -- I don't actually think this is 
necessary, right?  At first I was thinking we need to put this in a bunch more 
places, as I know some other tests recently had some flakiness fixed by adding 
this in.  But looking closer, this is only needed when are checking anything 
stored in our spark listener, not in the scheduler itself, right?
    
    That said, if there is any doubt, better to leave it in place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to