GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/1843
[SPARK-2889] Create Hadoop config objects consistently.
Different places in the code were intantiating Configuration /
YarnConfiguration objects in different ways. This could lead to confusion for
people who actually expected "spark.hadoop.*" options to end up in the configs
used by Spark code, since that would only happen for the SparkContext's config.
This change modifies most places to use SparkHadoopUtil to initialize
configs, and make that method do the translation that previously was only done
inside SparkContext.
The places that were not changed fall in one of the following categories:
- Test code where this doesn't really matter
- Places deep in the code where plumbing SparkConf would be too difficult
for very little gain
- Default values for arguments - since the caller can provide their own
config in that case
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-2889
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1843.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1843
----
commit 1e7003ff01778f1a3be0f006fc721495ce13a0e2
Author: Marcelo Vanzin <[email protected]>
Date: 2014-08-07T16:12:17Z
Replace explicit Configuration instantiation with SparkHadoopUtil.
This is the basic grunt work; code doesn't fully compile yet, since
I'll do some of the more questionable changes in separate commits.
commit b8ab1737c8230481a7797e5b174d07eea9f880d6
Author: Marcelo Vanzin <[email protected]>
Date: 2014-08-07T17:12:34Z
Update Utils API to take a Configuration argument.
Instead of using "new Configuration()" where a configuration is
needed, let the caller provide a context-appropriate config
object.
commit f16cadd2e4c0426d6aca1e125403c1427cb2d0c4
Author: Marcelo Vanzin <[email protected]>
Date: 2014-08-07T17:17:50Z
Initialize config in SparkHadoopUtil.
This is sort of hackish, since it doesn't account for any customization
someone might make to SparkConf before they actually start executing spark
code. Instead, this will only consider options available in the
system properties when creating the hadoop conf.
commit 3f2676052937d193b3415b7c7aeeb4a6dad8eeba
Author: Marcelo Vanzin <[email protected]>
Date: 2014-08-07T17:22:24Z
Compilation fix.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]