GitHub user nezihyigitbasi opened a pull request:

    https://github.com/apache/spark/pull/11241

    [SPARK-13328][Core]: Poor read performance for broadcast variables with 
dynamic resource allocation

    When dynamic resource allocation is enabled fetching broadcast variables 
from removed executors were causing job failures and SPARK-9591 fixed this 
problem by trying all locations of a block before giving up. However, the 
locations of a block is retrieved only once from the driver in this process and 
the locations in this list can be stale due to dynamic resource allocation. 
This situation gets worse when running on a large cluster as the size of this 
location list can be in the order of several hundreds out of which there may be 
tens of stale entries. What we have observed is with the default settings of 3 
max retries and 5s between retries (that's 15s per location) the time it takes 
to read a broadcast variable can be as high as ~17m (70 failed attempts * 
15s/attempt)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nezihyigitbasi/spark SPARK-13328

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11241
    
----
commit 45bdec651a2d15ad97638a676675c1697ddade09
Author: Nezih Yigitbasi <[email protected]>
Date:   2016-02-17T17:39:06Z

    Support refreshing block locations during reads

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to