GitHub user tgravescs opened a pull request:

    https://github.com/apache/spark/pull/19450

    [SPARK-22218] spark shuffle services fails to update secret on app 
re-attempts

    This patch fixes application re-attempts when running spark on yarn using 
the external shuffle service with security on.  Currently executors will fail 
to launch on any application re-attempt when launched on a nodemanager that had 
an executor from the first attempt.  The reason for this is because we aren't 
updating the secret key after the first application attempt.  The fix here is 
to just remove the containskey check to see if it already exists. In this way, 
we always add it and make sure its the most recent secret.  Similarly remove 
the check for containsKey on the remove since its just adding extra check that 
isn't really needed.
    
    Note this worked before spark 2.2 because the check used to be contains 
(which was looking for the value) rather then containsKey, so that never 
matched and it was just always adding the new secret.
    
    Patch was tested on a 10 node cluster as well as added the unit test.
    The test ran was a wordcount where the output directory already existed.  
With the bug present the application attempt failed with max number of executor 
Failures which were all saslExceptions.  With the fix present the application 
re-attempts fail with directory already exists or when you remove the directory 
between attempts the re-attemps succeed.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgravescs/spark SPARK-22218

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19450
    
----
commit aa9a2c2705cd59f9e1caa7e78ad7eafad3ff2789
Author: Thomas Graves <[email protected]>
Date:   2017-10-06T18:45:13Z

    [SPARK-22218] spark shuffle services fails to update secret on application 
re-attempts

commit 5a9ef1396a28b535042e336a2f43d80104ce95e5
Author: Thomas Graves <[email protected]>
Date:   2017-10-06T18:55:43Z

    minor updates

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to