GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/1218
Expose aplication ID in ApplicationStart event, use it in history server.
This change exposes the application ID generated by the Spark Master, Mesos
or Yarn
via the SparkListenerApplicationStart event. It then uses that information
to expose the
application via its ID in the history server, instead of using the internal
directory name
generated by the event logger as an application id. This allows someone who
knows
the application ID to easily figure out the URL for the application's entry
in the HS, aside
from looking better.
In Yarn mode, this is used to generate a direct link from the RM
application list to the
Spark history server entry (thus providing a fix for SPARK-2150).
Note this sort of assumes that the different managers will generate app ids
that are
sufficiently different from each other that clashes will not occur.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark yarn-hs-link-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1218.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1218
----
commit 179d2f1f0768c7775f01e4f3cba536d73ab06256
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-14T15:59:40Z
Expose application id to spark context.
Lay down the infrastructure to plumb a backend-generated application id
back to the SparkContext, and make the application ID generated for apps
running in standalone and yarn mode available.
commit 9167c60b093887ee47bd5b48cf071cd2ddac9b78
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-14T16:19:08Z
Expose the application ID in the ApplicationStart event.
commit a156d5399c6ff9d4bfef163fc8e90a9d1bbc92b6
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-14T16:45:40Z
[yarn] Make the RM link point to the app direcly in the HS.
commit 5a82c7d6571696dc61be112a0ab3ee4a855f0303
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-16T17:45:09Z
Use Mesos framework ID as Spark application ID.
commit c7f7d33c2adbed6f961adaf70fe9c08e5bd57e87
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-17T00:16:10Z
Make FsHistoryProvider keep a map of applications by id.
This makes it more efficient to search for applications by
id, since it's not necessarily related to the location of
the app in the file system.
Memory usage should be little worse than before, but by a
constant factor (since it's mostly the extra overhead of
a LinkedHashMap over an ArrayBuffer to maintain the data).
commit dc4b4e5b79aa40a4652b3626df78b4118aaded90
Author: Marcelo Vanzin <[email protected]>
Date: 2014-06-25T00:39:27Z
Wait until master responds before returning from start().
This allows the application ID set by the master to be included
in the SparkListenerApplicationStart event. This should affect
job scheduling because tasks can only be submitted after executors
register, which will happen after the client registers with the
master anyway.
(This is similar to what the Mesos backend does to implement
the same behavior.)
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---