GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/1845

    [SPARK-2849 / 2914] Handle certain Spark configs in bash correctly

    We currently rely on the Java properties file to parse 
`spark-defaults.conf` file. However, certain Spark configs need to be processed 
independently of the JVM. These fall into two general categories:
    
    **1) spark.driver.***
    
    In client deploy mode, the driver is launched from within `SparkSubmit`'s 
JVM. This means by the time we parse Spark configs from `spark-defaults.conf`, 
it is already too late to control certain properties of the driver's JVM. We 
currently ignore these configs in client mode altogether.
    ```
    spark.driver.memory
    spark.driver.extraJavaOptions
    spark.driver.extraClassPath
    spark.driver.extraLibraryPath
    ```
    **2) spark.*.extraJavaOptions**
    
    These configs involve a list of java options squeezed into a string. 
Currently, if any of the options include escaped double quotes or backslashes, 
the splitting of these options will likely be incorrect and the Java command to 
start the executors may be corrupted. The relevant configs here include the 
following.
    
    ```
    spark.driver.extraJavaOptions
    spark.executor.extraJavaOptions
    ```
    
    For both categories, we need to preemptively parse the Spark configs in 
bash. There is a lot of trickery involved in performing heavy-duty parsing in 
bash, and I have moved much of the complexity to a new file `bin/util.sh`. I 
have tested this locally with escaped double quotes, backslashes, whitespace, 
and a combination of the above, and the behavior is as expected.
    
    The changes build directly on top of my old PR at #1770.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark handle-configs-bash

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1845
    
----
commit 250cb955efe9c9bdf24be6cefcfb1dfa71d39bc4
Author: Andrew Or <[email protected]>
Date:   2014-08-04T20:12:15Z

    Do not ignore spark.driver.extra* for client mode

commit a2ab1b0a3a976e361ea86fc20fc7083e7f9885ca
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:32:05Z

    Parse spark.driver.extra* in bash

commit 0025474d7412607e1ca620d1942393d78a28b7f8
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:35:16Z

    Revert SparkSubmit handling of --driver-* options for only cluster mode

commit 63ed2e9dada83402d2502efb9759524be74c04b2
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:36:00Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit 75ee6b4a1c6df1a911cf62ded81e4eabb737b345
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:36:35Z

    Remove accidentally added file

commit 8843562bb6883a092e6f4032f05fe01932db086b
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:37:11Z

    Fix compilation issues...

commit 98dd8e327ac5940d8a4a3820027645ee4b88178e
Author: Andrew Or <[email protected]>
Date:   2014-08-06T04:39:07Z

    Add warning if properties file does not exist

commit 130f295e085d95e8205d882174a5667d29b3b1f2
Author: Andrew Or <[email protected]>
Date:   2014-08-06T05:12:28Z

    Handle spark.driver.memory too

commit 4edcaa8027961578246c5cfa8a2d82a92a031265
Author: Andrew Or <[email protected]>
Date:   2014-08-06T06:17:56Z

    Redirect stdout to stderr for python

commit e5cfb4627df353125f8f2382bad4bb35aa03c7fb
Author: Andrew Or <[email protected]>
Date:   2014-08-06T20:26:04Z

    Collapse duplicate code + fix potential whitespace issues

commit 4ec22a154c428e7e581d43d86bd7ef9d1308ca45
Author: Andrew Or <[email protected]>
Date:   2014-08-06T20:26:42Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit ef12f74b9b7e7edcefb6b82cb53de3eccbf0d9ad
Author: Andrew Or <[email protected]>
Date:   2014-08-06T20:31:32Z

    Minor formatting

commit fa2136ed14145f8fa18f40e6e1a3a776048c01ab
Author: Andrew Or <[email protected]>
Date:   2014-08-07T05:23:44Z

    Escape Java options + parse java properties files properly

commit dec23439ad82718f786ea022b1f118f202687cc1
Author: Andrew Or <[email protected]>
Date:   2014-08-07T05:28:38Z

    Only export variables if they exist

commit a4df3c4165ce4546742fbd0b9d92ea612973bb2e
Author: Andrew Or <[email protected]>
Date:   2014-08-07T05:47:57Z

    Move parsing and escaping logic to utils.sh
    
    This commit also fixes a deadly typo.

commit de765c9813275b939741e1b78567b2443fab5f2d
Author: Andrew Or <[email protected]>
Date:   2014-08-07T06:22:05Z

    Print spark-class command properly

commit 8e552b733d52ada89dd7c0e8692fcca87fc00d26
Author: Andrew Or <[email protected]>
Date:   2014-08-07T06:25:36Z

    Include an example of spark.*.extraJavaOptions
    
    Right now it's not super obvious how to specify multiple
    java options, especially ones with white spaces.

commit c13a2cb75b49cfbf7ae6765d900123595a5db076
Author: Andrew Or <[email protected]>
Date:   2014-08-07T06:26:38Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit c854859be8a604ac04c74488e7729423c47acd37
Author: Andrew Or <[email protected]>
Date:   2014-08-07T06:38:39Z

    Add small comment

commit 1cdc6b15ff375bfb0ce3fe3f6b6c434dc4e30947
Author: Andrew Or <[email protected]>
Date:   2014-08-07T19:33:55Z

    Fix bug: escape escaped double quotes properly
    
    The previous code used to ignore all closing quotes if the same token
    also has an escaped double quote. For example, in
    
    -Dkey="I am the \"man\""
    
    the last token contains both escaped quotes and valid quotes. This
    used to be interpreted as a token that doesn't have a closing quote
    when it actually does. This is fixed in this commit.

commit 45a1eb996773fa1828d1d489cbc451f2033845e0
Author: Andrew Or <[email protected]>
Date:   2014-08-07T20:28:56Z

    Fix bug: escape escaped backslashes and quotes properly...
    
    This is so that the way this is parsed and the way Java parses
    its java opts is consistent.

commit aabfc7e1da8897b266020da6c480cbe7d774bc99
Author: Andrew Or <[email protected]>
Date:   2014-08-07T21:24:57Z

    escape -> split (minor)

commit a992ae2ba7067cf76fba0e3ef192b275eee40b57
Author: Andrew Or <[email protected]>
Date:   2014-08-07T23:51:16Z

    Escape spark.*.extraJavaOptions correctly
    
    We previously never dealt with this correctly, in that we evaluated
    all backslashes twice, once when passing spark.*.extraJavaOptions
    into SparkSubmit, and another time when calling
    Utils.splitCommandString.
    
    This means we need to pass the raw values of these configs directly
    to the JVM without evaluating the backslashes when launching
    SparkSubmit. The way we do this is through a few custom environment
    variables.
    
    As of this commit, the user should follow the format outlined in
    spark-defaults.conf.template for spark.*.extraJavaOptions, and
    the expected java options (with quotes, whitespaces and backslashes
    and everything) will be propagated to the driver or the executors
    correctly.

commit c7b99267c577195882c965029884941b79cc8ed0
Author: Andrew Or <[email protected]>
Date:   2014-08-07T23:51:29Z

    Minor changes to spark-defaults.conf.template
    
    ... to highlight our new-found ability to deal with special values.

commit 5d8f8c481951269033ecc1ec0435060bdc25e926
Author: Andrew Or <[email protected]>
Date:   2014-08-07T23:52:29Z

    Merge branch 'master' of github.com:apache/spark into submit-driver-extra

commit e793e5f56c5c62d94fe0f2ac3d8aefc1d0b1573e
Author: Andrew Or <[email protected]>
Date:   2014-08-08T01:38:17Z

    Handle multi-line arguments

commit c2273fcad9a4f3e5cdfba56eb88e483100728a8a
Author: Andrew Or <[email protected]>
Date:   2014-08-08T01:53:20Z

    Fix typo (minor)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to