GitHub user tigerquoll opened a pull request:
https://github.com/apache/spark/pull/2516
Spark Core - [SPARK-3620] - Refactor of SparkSubmit Argument parsing code
Argument processing seems to have gotten a lot of attention lately, so I
thought I might throw my contribution into the ring. Attached for
consideration and to prompt discussion is a revamp of argument handling in
SparkSubmit aimed at making things a lot more consistent. The only things that
have been modified are the way that configuration properties are read/
processed and prioritised
Things to note include:
* All configuration parameters can now be consistently set via config file
* Configuration parameters defaults have been removed from the code, and
placed into a property file which is read from the class path on startup.
There should be no need to trace through 5 files to see what a config parameter
defaults to if it is not specified, or have different default values applied in
multiple places throughout the code.
* Configuration parameter validation is now done once all configuration
parameters have been read in and resolved from various locations, not just when
reading the command line.
* All property files (including spark_default_conf) are parsed by Java
property handling code. All custom parsing code has been removed. Escaping of
characters should now be consistent everywhere.
* All configuration parameters are overridden in the same consistent way -
configuration parameters for sparkSubmit are pulled form the following sources
in order of priority
1. Entries specified on the command line (except from --conf entries)
2. Entries specified on the command line with --conf
3. Environment variables (including legacy variable mappings)
4. System config variables (eg by using -Dspark.var.name)
5. $(SPARK_DEFAULT_CONF)/spark-defaults.conf or
$(SPARK_HOME)/conf/spark-defaults.conf if either exist
6. Hard coded defaults in class path at spark-submit-defaults.prop
* A property file specified by one of the sources listed above gets read in
and the properties are considered to be at the priority of the configuration
source that specified the file. A property specified in a property file will
not override an existing config value already specifiedby that configuration
source
The existing argument handling is pretty finicky - chances are high that
Iâve missed some behaviour - if this PR is going to be accepted/approved let
me know any bugs and Iâll fix them up and document the behaviour for future
reference
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tigerquoll/spark-3620 master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2516.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2516
----
commit b1a9682dd2bbff824c4e8481fa0ce5118c47de68
Author: Dale <[email protected]>
Date: 2014-09-21T02:42:24Z
Initial pass at using typesafe's conf object for handling configuration
options
commit 7bb5ee95b3f06147dba994e3d557221554415bfd
Author: Dale <[email protected]>
Date: 2014-09-21T02:44:09Z
Added defaults file
commit e995a6d1e8ab898c85aa5fe259b81c630595075f
Author: Dale <[email protected]>
Date: 2014-09-21T12:56:17Z
Existing tests now work
commit 00ee008c5652336d533d9619bc7e6306ed59138b
Author: Dale <[email protected]>
Date: 2014-09-21T13:05:14Z
Existing tests now work
commit 295c62b067fb5204efb58892133c77fe49b877e0
Author: Dale <[email protected]>
Date: 2014-09-22T22:04:45Z
Created mergedPropertyMap
commit f399170e1c05d75257ff6c508a96e64cadf0d87b
Author: Dale <[email protected]>
Date: 2014-09-23T00:10:40Z
Moved sparkSubmitArguments module to use custom property map merging code
commit b0abe3196f9e5d3f577e158704740f1eee8fbb59
Author: Dale <[email protected]>
Date: 2014-09-23T23:58:55Z
Merge branch 'master' of https://github.com/apache/spark
commit 562ec7c064e5ad632cf7aaa1720be29fe36b5c9a
Author: Dale <[email protected]>
Date: 2014-09-23T23:59:52Z
note for additional tests
commit 86f71f8bb8291fe20a2f0ca0100727d583e97dfd
Author: Dale <[email protected]>
Date: 2014-09-24T00:39:47Z
Changes needed to pass scalastyle check
commit 2019554ec307c8d3eabee7e4299cd8bac8faba0f
Author: Dale <[email protected]>
Date: 2014-09-24T04:43:58Z
Changes needed to pass scalastyle check, merged from current
SparkSubmit.scala
commit 8c416a04d064c1475a184785a9135d849c239bff
Author: Dale <[email protected]>
Date: 2014-09-24T05:19:24Z
Fixed some typos
commit b69f58e65d919a689942866f59b11a7dcf2fbf91
Author: Dale <[email protected]>
Date: 2014-09-24T07:08:01Z
Added spark.app.name to defaults list
Environment var overrides are now disabled if var is blank
Fixed bug stopping some tests from working
SparkConf now pulls basic config the same way SparkSubmit does
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]