GitHub user ssaavedra opened a pull request:
https://github.com/apache/spark/pull/19427
Reset spark.driver.bindAddress when starting a Checkpoint
## What changes were proposed in this pull request?
It seems that recovering from a checkpoint can replace the old
driver and executor IP addresses, as the workload can now be taking
place in a different cluster configuration. It follows that the
bindAddress for the master may also have changed. Thus we should not be
keeping the old one, and instead be added to the list of properties to
reset and recreate from the new environment.
## How was this patch tested?
This patch was tested via manual testing on AWS, using the experimental
(not yet merged) Kubernetes scheduler, which uses bindAddress to bind to a
Kubernetes service (and thus was how I first encountered the bug too), but it
is not a code-path related to the scheduler and this may have slipped through
when merging SPARK-4563.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ssaavedra/spark fix-checkpointing-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19427.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19427
----
commit 892555f452173b73aeabc077749c4c32a7d4e504
Author: Santiago Saavedra <[email protected]>
Date: 2017-09-28T15:30:29Z
Reset spark.driver.bindAddress when starting a Checkpoint
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]