xkrogen opened a new pull request #30096:
URL: https://github.com/apache/spark/pull/30096


   ### What changes were proposed in this pull request?
   Currently when run in `cluster` mode on YARN, the Spark `yarn.Client` will 
print out the application report into the logs, to be easily viewed by users. 
For example:
   ```
   INFO yarn.Client: 
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
         diagnostics: N/A
         ApplicationMaster host: X.X.X.X
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1602782566027
         final status: UNDEFINED
         tracking URL: http://hostname:8888/proxy/application_<id>/
         user: xkrogen
   ```
   
   I propose adding, alongside the application report, some additional lines 
like:
   ```
            Driver Logs (stdout): 
http://hostname:8042/node/containerlogs/container_<id>/xkrogen/stdout?start=-4096
            Driver Logs (stderr): 
http://hostname:8042/node/containerlogs/container_<id>/xkrogen/stderr?start=-4096
   ```
   
   This information isn't contained in the `ApplicationReport`, so it's 
necessary to query the ResourceManager REST API. For now I have added this as 
an always-on feature, but if there is any concern about adding this REST 
dependency, I think hiding this feature behind an off-by-default flag is 
reasonable.
   
   ### Why are the changes needed?
   Typically, the tracking URL can be used to find the logs of the 
ApplicationMaster/driver while the application is running. Later, the Spark 
History Server can be used to track this information down, using the 
stdout/stderr links on the Executors page.
   
   However, in the situation when the driver crashed _before_ writing out a 
history file, the SHS may not be aware of this application, and thus does not 
contain links to the driver logs. When this situation arises, it can be 
difficult for users to debug further, since they can't easily find their driver 
logs.
   
   It is possible to reach the logs by using the `yarn logs` commands, but the 
average Spark user isn't aware of this and shouldn't have to be.
   
   With this information readily available in the logs, users can quickly jump 
to their driver logs, even if it crashed before the SHS became aware of the 
application. This has the additional benefit of providing a quick way to access 
driver logs, which often contain useful information, in a single click (instead 
of navigating through the Spark UI).
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, some additional print statements will be created in the application 
report when using YARN in cluster mode.
   
   ### How was this patch tested?
   Manually tested on a cluster for now. I would appreciate any guidance on 
where would be an appropriate place to add a unit test, if any.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to