To answer my own question, it turns out what I was after is the YARN ResourceManager URL for the Spark application. As alluded to in SPARK-20458 <https://issues.apache.org/jira/browse/SPARK-20458>, it's possible to use the YARN API client to get this value. Here is a gist that shows how it can be done (given an instance of the Hadoop Configuration object): https://gist.github.com/jeff303/8dab0e52dc227741b6605f576a317798
On Fri, Jan 17, 2020 at 4:09 PM Jeff Evans <jeffrey.wayne.ev...@gmail.com> wrote: > Given a session/context, we can get the UI web URL like this: > > sparkSession.sparkContext.uiWebUrl > > This gives me something like http://node-name.cluster-name:4040. If > opening this from outside the cluster (ex: my laptop), this redirects > via HTTP 302 to something like > > http://node-name.cluster-name:8088/proxy/redirect/application_1579210019853_0023/ > . > For discussion purposes, call the latter one the "final web URL". > Critically, this final URL is active even after the application > terminates. The original uiWebUrl > (http://node-name.cluster-name:4040) is not available after the > application terminates, so one has to have captured the redirect in > time, if they want to provide a persistent link to that history server > UI entry (ex: for debugging purposes). > > Is there a way, other than using some HTTP client, to detect what this > final URL will be directly from the SparkContext? >