[GitHub] spark pull request: Expose aplication ID in ApplicationStart event...

vanzin Wed, 25 Jun 2014 15:33:35 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/1218


    Expose aplication ID in ApplicationStart event, use it in history server.

    This change exposes the application ID generated by the Spark Master, Mesos 
or Yarn
    via the SparkListenerApplicationStart event. It then uses that information 
to expose the
    application via its ID in the history server, instead of using the internal 
directory name
    generated by the event logger as an application id. This allows someone who 
knows
    the application ID to easily figure out the URL for the application's entry 
in the HS, aside
    from looking better.
    
    In Yarn mode, this is used to generate a direct link from the RM 
application list to the
    Spark history server entry (thus providing a fix for SPARK-2150).
    
    Note this sort of assumes that the different managers will generate app ids 
that are
    sufficiently different from each other that clashes will not occur.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark yarn-hs-link-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1218.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1218
    
----
commit 179d2f1f0768c7775f01e4f3cba536d73ab06256
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-14T15:59:40Z

    Expose application id to spark context.
    
    Lay down the infrastructure to plumb a backend-generated application id
    back to the SparkContext, and make the application ID generated for apps
    running in standalone and yarn mode available.

commit 9167c60b093887ee47bd5b48cf071cd2ddac9b78
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-14T16:19:08Z

    Expose the application ID in the ApplicationStart event.

commit a156d5399c6ff9d4bfef163fc8e90a9d1bbc92b6
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-14T16:45:40Z

    [yarn] Make the RM link point to the app direcly in the HS.

commit 5a82c7d6571696dc61be112a0ab3ee4a855f0303
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-16T17:45:09Z

    Use Mesos framework ID as Spark application ID.

commit c7f7d33c2adbed6f961adaf70fe9c08e5bd57e87
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-17T00:16:10Z

    Make FsHistoryProvider keep a map of applications by id.
    
    This makes it more efficient to search for applications by
    id, since it's not necessarily related to the location of
    the app in the file system.
    
    Memory usage should be little worse than before, but by a
    constant factor (since it's mostly the extra overhead of
    a LinkedHashMap over an ArrayBuffer to maintain the data).

commit dc4b4e5b79aa40a4652b3626df78b4118aaded90
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-25T00:39:27Z

    Wait until master responds before returning from start().
    
    This allows the application ID set by the master to be included
    in the SparkListenerApplicationStart event. This should affect
    job scheduling because tasks can only be submitted after executors
    register, which will happen after the client registers with the
    master anyway.
    
    (This is similar to what the Mesos backend does to implement
    the same behavior.)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Expose aplication ID in ApplicationStart event...

Reply via email to