[GitHub] spark pull request: [SPARK-2889] Create Hadoop config objects cons...

vanzin Thu, 07 Aug 2014 16:32:28 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/1843


    [SPARK-2889] Create Hadoop config objects consistently.

    Different places in the code were intantiating Configuration / 
YarnConfiguration objects in different ways. This could lead to confusion for 
people who actually expected "spark.hadoop.*" options to end up in the configs 
used by Spark code, since that would only happen for the SparkContext's config.
    
    This change modifies most places to use SparkHadoopUtil to initialize 
configs, and make that method do the translation that previously was only done 
inside SparkContext.
    
    The places that were not changed fall in one of the following categories:
    - Test code where this doesn't really matter
    - Places deep in the code where plumbing SparkConf would be too difficult 
for very little gain
    - Default values for arguments - since the caller can provide their own 
config in that case

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-2889

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1843.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1843
    
----
commit 1e7003ff01778f1a3be0f006fc721495ce13a0e2
Author: Marcelo Vanzin <[email protected]>
Date:   2014-08-07T16:12:17Z

    Replace explicit Configuration instantiation with SparkHadoopUtil.
    
    This is the basic grunt work; code doesn't fully compile yet, since
    I'll do some of the more questionable changes in separate commits.

commit b8ab1737c8230481a7797e5b174d07eea9f880d6
Author: Marcelo Vanzin <[email protected]>
Date:   2014-08-07T17:12:34Z

    Update Utils API to take a Configuration argument.
    
    Instead of using "new Configuration()" where a configuration is
    needed, let the caller provide a context-appropriate config
    object.

commit f16cadd2e4c0426d6aca1e125403c1427cb2d0c4
Author: Marcelo Vanzin <[email protected]>
Date:   2014-08-07T17:17:50Z

    Initialize config in SparkHadoopUtil.
    
    This is sort of hackish, since it doesn't account for any customization
    someone might make to SparkConf before they actually start executing spark
    code. Instead, this will only consider options available in the
    system properties when creating the hadoop conf.

commit 3f2676052937d193b3415b7c7aeeb4a6dad8eeba
Author: Marcelo Vanzin <[email protected]>
Date:   2014-08-07T17:22:24Z

    Compilation fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2889] Create Hadoop config objects cons...

Reply via email to