GitHub user shubhamchopra opened a pull request:
https://github.com/apache/spark/pull/14412
[SPARK-15355] [CORE] [WIP] Proactive block replication
## What changes were proposed in this pull request?
We are proposing addition of pro-active block replication in case of
executor failures. BlockManagerMasterEndpoint does all the book-keeping to keep
a track of all the executors and the blocks they hold. It also keeps a track of
which executors are alive through heartbeats. When an executor is removed, all
this book-keeping state is updated to reflect the lost executor. This step can
be used to identify executors that are still in possession of a copy of the
cached data and a message could be sent to them to use the existing "replicate"
function to find and place new replicas on other suitable hosts. Blocks
replicated this way will let the master know of their existence.
This can happen when an executor is lost, and would that way be pro-active
as opposed be being done at query time.
## How was this patch tested?
This patch was tested with existing unit tests along with new unit tests
added to test the functionality.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shubhamchopra/spark ProactiveBlockReplication
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14412.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14412
----
commit 779ce27dbeedd4d5c72e28782c9d38af51d2060c
Author: Shubham Chopra <[email protected]>
Date: 2016-05-05T22:06:14Z
Adding capability to prioritize peer executors based on rack awareness
while replicating blocks.
commit d0b6747f1fc9a0b701ab41fe5cf67939ed36cb9e
Author: Shubham Chopra <[email protected]>
Date: 2016-05-06T17:40:47Z
Minor modifications to get past the style check errors.
commit 942908ac060fbdd29d0efd1f8541436bf9cd46d8
Author: Shubham Chopra <[email protected]>
Date: 2016-05-06T20:31:22Z
Using blockId hashcode as a source of randomness, so we don't keep choosing
the same peers for replication.
commit 0902e39fc7a2526539013e67c48bc13b6991bf07
Author: Shubham Chopra <[email protected]>
Date: 2016-05-09T20:36:53Z
Several changes:
1. Adding rack attribute to hashcode and equals to block manager id.
2. Removing boolean check for rack awareness. Asking master for rack info,
and master uses topology mapper.
3. Adding a topology mapper trait and a default implementation that block
manager master endpoint uses to discern topology information.
commit 86e1e0212b0dae0d598f0128c6a7b8f33429dc27
Author: Shubham Chopra <[email protected]>
Date: 2016-05-09T20:58:21Z
Adding null check so a Block Manager can be initiaziled without the master.
commit a3b50ae9bcca7e871d384fa4614b2c77ac5ff5ad
Author: Shubham Chopra <[email protected]>
Date: 2016-05-12T21:09:16Z
Renaming classes/variables from rack to a more general topology.
commit 1ee7948ce3994df08119418b779f8cc2e5aaca86
Author: Shubham Chopra <[email protected]>
Date: 2016-05-12T21:15:46Z
Renaming classes/variables from rack to a more general topology.
commit 8de5c6e39cd0a868094803a0f53b3b50b7ed90d5
Author: Shubham Chopra <[email protected]>
Date: 2016-05-12T21:27:29Z
We continue to randomly choose peers, so there is no change in current
behavior.
commit 72ae37d64724423c65d3a23559a5f46649ffa4c3
Author: Shubham Chopra <[email protected]>
Date: 2016-05-13T15:36:17Z
Spelling correction and minor changes in comments to use a more general
topology instead of rack.
commit e071ca3a838193efad715764cc654507ee254e44
Author: Shubham Chopra <[email protected]>
Date: 2016-05-13T20:32:13Z
Minor change. Changing replication info message to debug level.
commit 96aaf6ec50ae943c1345966cfc11fd4180ddfa3a
Author: Shubham Chopra <[email protected]>
Date: 2016-05-16T21:47:33Z
Providing peersReplicateTo to the prioritizer.
commit d125188d633744cfeddf5b0436b3217ef87a2220
Author: Shubham Chopra <[email protected]>
Date: 2016-05-17T19:25:34Z
Adding developer api annotations to TopologyMapper and
BlockReplicationPrioritization
commit 16a1ce89c5b48c3770de1e32519c8690de296058
Author: Shubham Chopra <[email protected]>
Date: 2016-05-18T20:52:22Z
Changes recommended by @HyukjinKwon to fix style issues.
commit da4568e03e3690781bb03e2df2e587ceecd59bf0
Author: Shubham Chopra <[email protected]>
Date: 2016-05-20T18:43:07Z
Updating prioritizer api to use current blockmanager id for self
identification.
commit 30edb1ef3924932b1cf9184a105d16ca40689572
Author: Shubham Chopra <[email protected]>
Date: 2016-07-29T19:22:00Z
Pro-actively replenishing blocks from failed executors.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]