GitHub user shubhamchopra opened a pull request:

    https://github.com/apache/spark/pull/14412

    [SPARK-15355] [CORE] [WIP] Proactive block replication

    ## What changes were proposed in this pull request?
    
    We are proposing addition of pro-active block replication in case of 
executor failures. BlockManagerMasterEndpoint does all the book-keeping to keep 
a track of all the executors and the blocks they hold. It also keeps a track of 
which executors are alive through heartbeats. When an executor is removed, all 
this book-keeping state is updated to reflect the lost executor. This step can 
be used to identify executors that are still in possession of a copy of the 
cached data and a message could be sent to them to use the existing "replicate" 
function to find and place new replicas on other suitable hosts. Blocks 
replicated this way will let the master know of their existence.
    
    This can happen when an executor is lost, and would that way be pro-active 
as opposed be being done at query time.
    
    ## How was this patch tested?
    
    This patch was tested with existing unit tests along with new unit tests 
added to test the functionality.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shubhamchopra/spark ProactiveBlockReplication

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14412
    
----
commit 779ce27dbeedd4d5c72e28782c9d38af51d2060c
Author: Shubham Chopra <[email protected]>
Date:   2016-05-05T22:06:14Z

    Adding capability to prioritize peer executors based on rack awareness 
while replicating blocks.

commit d0b6747f1fc9a0b701ab41fe5cf67939ed36cb9e
Author: Shubham Chopra <[email protected]>
Date:   2016-05-06T17:40:47Z

    Minor modifications to get past the style check errors.

commit 942908ac060fbdd29d0efd1f8541436bf9cd46d8
Author: Shubham Chopra <[email protected]>
Date:   2016-05-06T20:31:22Z

    Using blockId hashcode as a source of randomness, so we don't keep choosing 
the same peers for replication.

commit 0902e39fc7a2526539013e67c48bc13b6991bf07
Author: Shubham Chopra <[email protected]>
Date:   2016-05-09T20:36:53Z

    Several changes:
    1. Adding rack attribute to hashcode and equals to block manager id.
    2. Removing boolean check for rack awareness. Asking master for rack info, 
and master uses topology mapper.
    3. Adding a topology mapper trait and a default implementation that block 
manager master endpoint uses to discern topology information.

commit 86e1e0212b0dae0d598f0128c6a7b8f33429dc27
Author: Shubham Chopra <[email protected]>
Date:   2016-05-09T20:58:21Z

    Adding null check so a Block Manager can be initiaziled without the master.

commit a3b50ae9bcca7e871d384fa4614b2c77ac5ff5ad
Author: Shubham Chopra <[email protected]>
Date:   2016-05-12T21:09:16Z

    Renaming classes/variables from rack to a more general topology.

commit 1ee7948ce3994df08119418b779f8cc2e5aaca86
Author: Shubham Chopra <[email protected]>
Date:   2016-05-12T21:15:46Z

    Renaming classes/variables from rack to a more general topology.

commit 8de5c6e39cd0a868094803a0f53b3b50b7ed90d5
Author: Shubham Chopra <[email protected]>
Date:   2016-05-12T21:27:29Z

    We continue to randomly choose peers, so there is no change in current 
behavior.

commit 72ae37d64724423c65d3a23559a5f46649ffa4c3
Author: Shubham Chopra <[email protected]>
Date:   2016-05-13T15:36:17Z

    Spelling correction and minor changes in comments to use a more general 
topology instead of rack.

commit e071ca3a838193efad715764cc654507ee254e44
Author: Shubham Chopra <[email protected]>
Date:   2016-05-13T20:32:13Z

    Minor change. Changing replication info message to debug level.

commit 96aaf6ec50ae943c1345966cfc11fd4180ddfa3a
Author: Shubham Chopra <[email protected]>
Date:   2016-05-16T21:47:33Z

    Providing peersReplicateTo to the prioritizer.

commit d125188d633744cfeddf5b0436b3217ef87a2220
Author: Shubham Chopra <[email protected]>
Date:   2016-05-17T19:25:34Z

    Adding developer api annotations to TopologyMapper and 
BlockReplicationPrioritization

commit 16a1ce89c5b48c3770de1e32519c8690de296058
Author: Shubham Chopra <[email protected]>
Date:   2016-05-18T20:52:22Z

    Changes recommended by @HyukjinKwon to fix style issues.

commit da4568e03e3690781bb03e2df2e587ceecd59bf0
Author: Shubham Chopra <[email protected]>
Date:   2016-05-20T18:43:07Z

    Updating prioritizer api to use current blockmanager id for self 
identification.

commit 30edb1ef3924932b1cf9184a105d16ca40689572
Author: Shubham Chopra <[email protected]>
Date:   2016-07-29T19:22:00Z

    Pro-actively replenishing blocks from failed executors.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to