[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

vanzin Wed, 25 Jun 2014 16:52:24 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/1222


    [SPARK-2261] Make event logger use a single file.

    Currently the event logger uses a directory and several files to
    describe an app's event log, all but one of which are empty. This
    is not very HDFS-friendly, since creating lots of nodes in HDFS
    (especially when they don't contain any data) is frowned upon due
    to the node metadata being kept in the NameNode's memory.
    
    Instead, all the metadata needed for the app log file can be
    encoded in the file name itself. (HDFS is adding extended attributes
    which could be used for this, but we need to support older versions.)
    
    This change implements that approach, and also gets rid of FileLogger,
    which was only used by EventLoggingListener and the little functionality
    it provided can be much more concisely implemented inside the listener
    itself.
    
    With the new approach, aside from reducing the load on the NN, there's
    also a lot less remote calls needed when reading the log directory.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark hist-server-single-log

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1222.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1222
    
----
commit 28ee5004d13bc259daf36dd9af9838160057dca1
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-22T20:41:54Z

    Make event logger use a single file.
    
    Currently the event logger uses a directory and several files to
    describe an app's event log, all but one of which are empty. This
    is not very HDFS-friendly, since creating lots of nodes in HDFS
    (especially when they don't contain any data) is frowned upon due
    to the node metadata being kept in the NameNode's memory.
    
    Instead, all the metadata needed for the app log file can be
    encoded in the file name itself. (HDFS is adding extended attributes
    which could be used for this, but we need to support older versions.)
    
    This change implements that approach, and also gets rid of FileLogger,
    which was only used by EventLoggingListener and the little functionality
    it provided can be much more concisely implemented inside the listener
    itself.
    
    With the new approach, aside from reducing the load on the NN, there's
    also a lot less remote calls needed when reading the log directory.

commit 9ee0d35dc6d06d20b9018fe45581dce1a002d2b7
Author: Marcelo Vanzin <[email protected]>
Date:   2014-05-23T20:12:13Z

    Make history server parse old-style log directories.
    
    Spark 1.0 will generate log directories instead of single log
    files for applications; so it's nice to have the history server
    understand both styles.

commit 45f894e939e7da69e5c699a1a5f157c5f02edf0f
Author: Marcelo Vanzin <[email protected]>
Date:   2014-06-24T21:32:46Z

    Show prettier name in UI.
    
    Work around a SparkUI issue where the name to show has to be
    provided in the constructor.
    
    Also remove explicit flushes from logging code, since they're
    not really useful now that the HS only reads data from finished
    apps (and the API used does not exist in Hadoop trunk).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2261] Make event logger use a single fi...

Reply via email to