Github user Sherry302 commented on the issue:
https://github.com/apache/spark/pull/14659
Hi, @srowen . Thank you so much for the review. Sorry for the test
failure and late update. The failure reasons are that âjobIDâ were
none or there was no âspark.app.nameâ in sparkConf. I have updated the
PR to set
default values to âjobIDâ and âspark.app.nameâ. When a real
application runs on
Spark, it will always have âjobIDâ and âspark.app.nameâ.
What's the use case for this?
When users run Spark applications on Yarn on HDFS, Sparkâs
caller contexts will be written into hdfs-audit.log. The Spark caller
contexts
are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applicationsâ
name.
The caller context can help users to better diagnose and understand how
specific
applications impacting parts of the Hadoop system and potential problems
they
may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a
given HDFS operation, it's very helpful to track which upper level job
issues
it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]