Re: [PR] [SPARK-46021][CORE] Support cancel future jobs belonging to a job group [spark]

via GitHub Mon, 20 Nov 2023 23:42:49 -0800


JoshRosen commented on code in PR #43926:
URL: https://github.com/apache/spark/pull/43926#discussion_r1400125810



##########
common/utils/src/main/resources/error/README.md:
##########
@@ -1347,7 +1347,7 @@ The following SQLSTATEs are collated from:
 |XX001    |XX   |Internal Error                                    |001     
|data_corrupted                                              |PostgreSQL     |N 
      |PostgreSQL Redshift                                                      
   |
 |XX002    |XX   |Internal Error                                    |002     
|index_corrupted                                             |PostgreSQL     |N 
      |PostgreSQL Redshift                                                      
   |
 |XXKD0    |XX   |Internal Error                                    |KD0     
|Analysis - Bad plan                                         |Databricks     |N 
      |Databricks                                                               
   |
-|XXKDA    |XX   |Internal Error                                    |KAS     
|Scheduler (Aether Scheduler)                                |Databricks     |N 
      |Databricks                                                               
   |

Review Comment:
   Given that this is adding an error class for an existing OSS Apache Spark 
error, we should probably use a non-Databricks error code here.



##########
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##########
@@ -169,6 +169,12 @@ private[spark] class DAGScheduler(
 
   private[scheduler] val activeJobs = new HashSet[ActiveJob]
 
+  // Job groups that are canceled with `cancelFutureJobs` as true, with at most
+  // `CANCELLED_JOB_GROUP_SET_SIZE` stored. On a new job submission, if the 
job group is in this
+  // set, the job will be immediately canceled.

Review Comment:
   Spark isn't perfectly consistent about "cancelled" vs "canceled" (two 'L's 
or one) (and it [seems this is a British vs. American english 
difference](https://www.merriam-webster.com/grammar/canceled-or-cancelled)), 
but it looks like Spark's existing user-facing messages generally use the two-L 
version, e.g. at
   
   
https://github.com/apache/spark/blob/d908a212ef6144338f075354b4636a13f1172478/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L2730
   
   Given this, I think we should aim for consistency with the existing 
user-facing messages and use "cancelled" throughout.
   
   



##########
core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala:
##########
@@ -221,6 +221,18 @@ private[spark] object SparkCoreErrors {
     new NoSuchElementException(id)
   }
 
+  def sparkJobCancelled(jobId: Int, reason: String, e: Exception): 
SparkException = {
+    new SparkException(
+      errorClass = "SPARK_JOB_CANCELLED",
+      messageParameters = Map("jobId" -> jobId.toString, "reason" -> reason),
+      cause = e
+    )
+  }
+
+  def sparkJobCancelledAsPartOfJobGroupError(jobId: Int, jobGroupId: String): 
SparkException = {
+    sparkJobCancelled(jobId, s"part of cancelled job group $jobGroupId", null)

Review Comment:
   It looks like this is consistent with the message thrown in the other code 
path at 
   
   
https://github.com/apache/spark/blob/021a5e66f530d6a5621782ea6caaf7f8f3e11ae3/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1207-L1209
   
   which is important in case end-user code is matching on these exception 
messages (since we previously didn't provide a special exception or error class 
for distinguishing these errors) 👍 



##########
core/src/main/scala/org/apache/spark/errors/SparkCoreErrors.scala:
##########
@@ -221,6 +221,18 @@ private[spark] object SparkCoreErrors {
     new NoSuchElementException(id)
   }
 
+  def sparkJobCancelled(jobId: Int, reason: String, e: Exception): 
SparkException = {
+    new SparkException(
+      errorClass = "SPARK_JOB_CANCELLED",
+      messageParameters = Map("jobId" -> jobId.toString, "reason" -> reason),
+      cause = e
+    )
+  }
+
+  def sparkJobCancelledAsPartOfJobGroupError(jobId: Int, jobGroupId: String): 
SparkException = {
+    sparkJobCancelled(jobId, s"part of cancelled job group $jobGroupId", null)

Review Comment:
   It looks like this is consistent with the message thrown in the other code 
path at 
   
   
https://github.com/apache/spark/blob/021a5e66f530d6a5621782ea6caaf7f8f3e11ae3/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1207-L1209
   
   which is important in case end-user code is matching on these exception 
messages (since we previously didn't provide a special exception or error class 
for distinguishing these errors) 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46021][CORE] Support cancel future jobs belonging to a job group [spark]

Reply via email to