xkrogen opened a new pull request #30096:
URL: https://github.com/apache/spark/pull/30096
### What changes were proposed in this pull request?
Currently when run in `cluster` mode on YARN, the Spark `yarn.Client` will
print out the application report into the logs, to be easily viewed by users.
For example:
```
INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: N/A
ApplicationMaster host: X.X.X.X
ApplicationMaster RPC port: 0
queue: default
start time: 1602782566027
final status: UNDEFINED
tracking URL: http://hostname:8888/proxy/application_<id>/
user: xkrogen
```
I propose adding, alongside the application report, some additional lines
like:
```
Driver Logs (stdout):
http://hostname:8042/node/containerlogs/container_<id>/xkrogen/stdout?start=-4096
Driver Logs (stderr):
http://hostname:8042/node/containerlogs/container_<id>/xkrogen/stderr?start=-4096
```
This information isn't contained in the `ApplicationReport`, so it's
necessary to query the ResourceManager REST API. For now I have added this as
an always-on feature, but if there is any concern about adding this REST
dependency, I think hiding this feature behind an off-by-default flag is
reasonable.
### Why are the changes needed?
Typically, the tracking URL can be used to find the logs of the
ApplicationMaster/driver while the application is running. Later, the Spark
History Server can be used to track this information down, using the
stdout/stderr links on the Executors page.
However, in the situation when the driver crashed _before_ writing out a
history file, the SHS may not be aware of this application, and thus does not
contain links to the driver logs. When this situation arises, it can be
difficult for users to debug further, since they can't easily find their driver
logs.
It is possible to reach the logs by using the `yarn logs` commands, but the
average Spark user isn't aware of this and shouldn't have to be.
With this information readily available in the logs, users can quickly jump
to their driver logs, even if it crashed before the SHS became aware of the
application. This has the additional benefit of providing a quick way to access
driver logs, which often contain useful information, in a single click (instead
of navigating through the Spark UI).
### Does this PR introduce _any_ user-facing change?
Yes, some additional print statements will be created in the application
report when using YARN in cluster mode.
### How was this patch tested?
Manually tested on a cluster for now. I would appreciate any guidance on
where would be an appropriate place to add a unit test, if any.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]