GitHub user tigerquoll opened a pull request:

    https://github.com/apache/spark/pull/2516

    Spark Core - [SPARK-3620] - Refactor of SparkSubmit Argument parsing code

    Argument processing seems to have gotten a lot of attention lately, so I 
thought I might throw my contribution into the ring.  Attached for 
consideration and to prompt discussion is a revamp of argument handling in 
SparkSubmit aimed at making things a lot more consistent. The only things that 
have been modified are the way that configuration properties are read/ 
processed and prioritised 
    
    Things to note include:
    * All configuration parameters can now be consistently set via config file
    
    * Configuration parameters defaults have been removed from the code, and 
placed into a property file which is read from the class path on startup.  
There should be no need to trace through 5 files to see what a config parameter 
defaults to if it is not specified, or have different default values applied in 
multiple places throughout the code.
    
    * Configuration parameter validation is now done once all configuration 
parameters have been read in and resolved from various locations, not just when 
reading the command line.
    
    * All property files (including spark_default_conf) are parsed by Java 
property handling code. All custom parsing code has been removed. Escaping of 
characters should now be consistent everywhere.
    
    * All configuration parameters are overridden in the same consistent way - 
configuration parameters for sparkSubmit are pulled form the following sources 
in order of priority
     1. Entries specified on the command line (except from --conf entries)
     2. Entries specified on the command line with --conf
     3. Environment variables (including legacy variable mappings)
     4. System config variables (eg by using -Dspark.var.name)
     5. $(SPARK_DEFAULT_CONF)/spark-defaults.conf or 
$(SPARK_HOME)/conf/spark-defaults.conf if either exist
     6. Hard coded defaults in class path at spark-submit-defaults.prop
    
    * A property file specified by one of the sources listed above gets read in 
and the properties are considered to be at the priority of the configuration 
source that specified the file. A property specified in a property file will 
not override an existing config value already specifiedby that configuration 
source
    
    The existing argument handling is pretty finicky - chances are high that 
I’ve missed some behaviour - if this PR is going to be accepted/approved let 
me know any bugs and I’ll fix them up and document the behaviour for future 
reference

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tigerquoll/spark-3620 master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2516
    
----
commit b1a9682dd2bbff824c4e8481fa0ce5118c47de68
Author: Dale <[email protected]>
Date:   2014-09-21T02:42:24Z

    Initial pass at using typesafe's conf object for handling configuration 
options

commit 7bb5ee95b3f06147dba994e3d557221554415bfd
Author: Dale <[email protected]>
Date:   2014-09-21T02:44:09Z

    Added defaults file

commit e995a6d1e8ab898c85aa5fe259b81c630595075f
Author: Dale <[email protected]>
Date:   2014-09-21T12:56:17Z

    Existing tests now work

commit 00ee008c5652336d533d9619bc7e6306ed59138b
Author: Dale <[email protected]>
Date:   2014-09-21T13:05:14Z

    Existing tests now work

commit 295c62b067fb5204efb58892133c77fe49b877e0
Author: Dale <[email protected]>
Date:   2014-09-22T22:04:45Z

    Created mergedPropertyMap

commit f399170e1c05d75257ff6c508a96e64cadf0d87b
Author: Dale <[email protected]>
Date:   2014-09-23T00:10:40Z

    Moved sparkSubmitArguments module to use custom property map merging code

commit b0abe3196f9e5d3f577e158704740f1eee8fbb59
Author: Dale <[email protected]>
Date:   2014-09-23T23:58:55Z

    Merge branch 'master' of https://github.com/apache/spark

commit 562ec7c064e5ad632cf7aaa1720be29fe36b5c9a
Author: Dale <[email protected]>
Date:   2014-09-23T23:59:52Z

    note for additional tests

commit 86f71f8bb8291fe20a2f0ca0100727d583e97dfd
Author: Dale <[email protected]>
Date:   2014-09-24T00:39:47Z

    Changes needed to pass scalastyle check

commit 2019554ec307c8d3eabee7e4299cd8bac8faba0f
Author: Dale <[email protected]>
Date:   2014-09-24T04:43:58Z

    Changes needed to pass scalastyle check, merged from current 
SparkSubmit.scala

commit 8c416a04d064c1475a184785a9135d849c239bff
Author: Dale <[email protected]>
Date:   2014-09-24T05:19:24Z

    Fixed some typos

commit b69f58e65d919a689942866f59b11a7dcf2fbf91
Author: Dale <[email protected]>
Date:   2014-09-24T07:08:01Z

    Added spark.app.name to defaults list
    Environment var overrides are now disabled if var is blank
    Fixed bug stopping some tests from working
    SparkConf now pulls basic config the same way SparkSubmit does

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to