AngersZhuuuu opened a new pull request #33934:
URL: https://github.com/apache/spark/pull/33934
### What changes were proposed in this pull request?
In current pyspark, stderr and stdout are print together, if python script
exit, PythonRunner will only throw a `SparkUserAppsException` with exit code 1.
Then pass this error to AM.
In cluster mode, client side only got exception `SparkUserAppsException` and
show
```
User application exited with 1.
```
Without correct error message. Then user need to check ApplicationMaster's
stdout log file to find out why their job failed.
In this pr, make PythonRunner can throw exception message to backend.
### Why are the changes needed?
Make user to know error message more easy.
### Does this PR introduce _any_ user-facing change?
In cluster mode, user can directly see pyspark's error message in client
side.
### How was this patch tested?
If we run a sql with wrong table in python script. In ApplicationMaster and
client side log will show
```
21/09/08 14:08:42 ERROR Client: Application diagnostics message: User
application exited with 1.
Exception in thread "main" org.apache.spark.SparkException: Application
application_1630930053097_708441 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1150)
at
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
Now will show
```
21/09/08 14:08:42 ERROR Client: Application diagnostics message: User
application exited with 1 and error message Traceback (most recent call last):
File "test.py", line 68, in <module>
res = client.sql(exec_sql)
File
"/mnt/ssd/0/yarn/nm-local-dir/usercache/yi.zhu/appcache/application_1630930053097_708441/container_e236_1630930053097_708441_02_000002/pyspark.zip/pyspark/sql/session.py",
line 767, in sql
File
"/mnt/ssd/0/yarn/nm-local-dir/usercache/yi.zhu/appcache/application_1630930053097_708441/container_e236_1630930053097_708441_02_000002/py4j-0.10.7-src.zip/py4j/java_gateway.py",
line 1257, in __call__
File
"/mnt/ssd/0/yarn/nm-local-dir/usercache/yi.zhu/appcache/application_1630930053097_708441/container_e236_1630930053097_708441_02_000002/pyspark.zip/pyspark/sql/utils.py",
line 69, in deco
pyspark.sql.utils.AnalysisException: u"Table or view not found:
`shopee`.`trafficixadwadwa_mart_dwd__click_di`; line 14 pos
9;\n'InsertIntoTable 'UnresolvedRelation
`shopee`.`fact_shopee_bp_traffic_mart_click_di`, Map(dt -> None, country ->
None), true, false\n+- 'Repartition 50, true\n +- 'Project
[cast('get_json_object('data, $.shopid) as bigint) AS shopid#4,
cast('get_json_object('data, $.itemid) as bigint) AS itemid#5,
cast('get_json_object('data, $.quantity) as bigint) AS quantity#6, 'userid,
'platform, 'page_type, 'log_timestamp, 'utc_date AS dt#7, 'grass_region AS
country#8]\n +- 'Filter ((('utc_date = cast(2021-01-01 as date)) &&
('grass_region = ID)) && ('operation = action_add_to_cart_success))\n
+- 'SubqueryAlias `di`\n +- 'UnresolvedRelation
`shopee`.`trafficixadwadwa_mart_dwd__click_di`\n"
Exception in thread "main" org.apache.spark.SparkException: Application
application_1630930053097_708441 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1150)
at
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1530)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]