GitHub user ssaavedra opened a pull request:

    https://github.com/apache/spark/pull/19427

    Reset spark.driver.bindAddress when starting a Checkpoint

    ## What changes were proposed in this pull request?
    
    It seems that recovering from a checkpoint can replace the old
    driver and executor IP addresses, as the workload can now be taking
    place in a different cluster configuration. It follows that the
    bindAddress for the master may also have changed. Thus we should not be
    keeping the old one, and instead be added to the list of properties to
    reset and recreate from the new environment.
    
    ## How was this patch tested?
    
    This patch was tested via manual testing on AWS, using the experimental 
(not yet merged) Kubernetes scheduler, which uses bindAddress to bind to a 
Kubernetes service (and thus was how I first encountered the bug too), but it 
is not a code-path related to the scheduler and this may have slipped through 
when merging SPARK-4563.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ssaavedra/spark fix-checkpointing-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19427
    
----
commit 892555f452173b73aeabc077749c4c32a7d4e504
Author: Santiago Saavedra <[email protected]>
Date:   2017-09-28T15:30:29Z

    Reset spark.driver.bindAddress when starting a Checkpoint

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to