AngersZhuuuu commented on PR #40314:
URL: https://github.com/apache/spark/pull/40314#issuecomment-1459223471
> Hi, @AngersZhuuuu .
> This PR seems to have insufficient information. Could you provide more
details about how to validate this in what environment?
We run a client mode SparkSubmit job and throw below exception
```
23/03/07 18:34:50 INFO YarnClientSchedulerBackend: Shutting down all
executors
23/03/07 18:34:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each
executor to shut down
23/03/07 18:34:50 INFO YarnClientSchedulerBackend: YARN client scheduler
backend Stopped
23/03/07 18:34:50 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
23/03/07 18:34:50 INFO BlockManager: BlockManager stopped
23/03/07 18:34:50 INFO BlockManagerMaster: BlockManagerMaster stopped
23/03/07 18:34:50 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
23/03/07 18:34:50 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or
view not found: xxx.xxx; line 1 pos 14;
'GlobalLimit 1
+- 'LocalLimit 1
+- 'Project [*]
+- 'UnresolvedRelation [xxx, xxx], [], false
at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:115)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:95)
at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183)
at scala.collection.immutable.List.foreach(List.scala:392)
at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183)
at scala.collection.immutable.List.foreach(List.scala:392)
at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183)
at scala.collection.immutable.List.foreach(List.scala:392)
at
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:95)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:92)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:178)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
at
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:175)
at
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778)
at
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
at
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)
at
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)
at
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:621)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:616)
at
org.apache.spark.sql.auth.QueryAuthChecker$.main(QueryAuthChecker.scala:33)
at
org.apache.spark.sql.auth.QueryAuthChecker.main(QueryAuthChecker.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
23/03/07 18:34:50 INFO ShutdownHookManager: Shutdown hook called
23/03/07 18:34:50 INFO ShutdownHookManager: Deleting directory
/tmp/spark-8ce833e1-3cd4-4a9f-960d-695be85b12f4
23/03/07 18:34:50 INFO ShutdownHookManager: Deleting directory
/hadoop/spark/sparklocaldir/spark-58bbd530-6144-4ad6-b62b-a690baac9f96
23/03/07 18:34:50 INFO SparkExecutionPlanProcessor: Lineage thread pool
prepares to shut down
23/03/07 18:34:50 INFO SparkExecutionPlanProcessor: Lineage thread pool
finishes to await termination and shuts down
```
This job failed, but with call `sparkContext.stop()`, client side failed but
in AM it shows SUCCESS
In spark-3.1.2 the code like this
```
try {
app.start(childArgs.toArray, sparkConf)
} catch {
case t: Throwable =>
throw findCause(t)
} finally {
if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
!isThriftServer(args.mainClass)) {
try {
SparkContext.getActive.foreach(_.stop())
} catch {
case e: Throwable => logError(s"Failed to close SparkContext: $e")
}
}
}
```
So here for normal job, I think we should pass the exit code to
SchedulerBackend, right?
Then after your mention, I see that
https://github.com/apache/spark/pull/33403 change the behavior that only k8s
call `sc.stop()`, then I think for k8s and yarn mode we booth need to pass the
exit code the backend.
After this pr, we also need to check if k8s backend exit code is same as
client side in client mode too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]