WeichenXu123 commented on a change in pull request #32399:
URL: https://github.com/apache/spark/pull/32399#discussion_r629087709



##########
File path: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala
##########
@@ -161,11 +169,26 @@ class TrainValidationSplit @Since("1.5.0") 
(@Since("1.5.0") override val uid: St
     }
 
     // Wait for all metrics to be calculated
-    val metrics = metricFutures.map(ThreadUtils.awaitResult(_, Duration.Inf))
-
-    // Unpersist training & validation set once all metrics have been produced
-    trainingDataset.unpersist()
-    validationDataset.unpersist()
+    val metrics = try {
+      metricFutures.map(ThreadUtils.awaitResult(_, Duration.Inf))
+    }
+    catch {
+      case e: Throwable =>
+        subTaskFailed = true
+        throw e
+    }
+    finally {
+      if (subTaskFailed) {
+        Thread.sleep(1000)

Review comment:
       This sleep is for:
   
   each trial task which thread already running, may took some time running 
before it launch spark job, if here we cancel job immediately, then we may miss 
killing the spark job which will be spawned soon
   
   pseudocode for this:
   
   ```
   def trial_thread_target():
      if subTaskFailed:
          raise Error()
      else:
         # 1. run some code here
         # 2. launch a spark job...
         # 3. run some code here
         # 4. launch a second spark job...
         # ....
   ```
   
   Suppose `cancelJobGroup` called at the running time of step 1/step 3 , then 
we may miss killing the spark job spwaned at step 2/step 4




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to