Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11241#discussion_r53532327
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -575,6 +582,18 @@ private[spark] class BlockManager(
                 // This location failed, so we retry fetch from a different 
one by returning null here
                 logWarning(s"Failed to fetch remote block $blockId " +
                   s"from $loc (failed attempt $numFetchFailures)", e)
    +
    +            // if dynamic alloc. is enabled and if there is a large number 
of executors
    +            // then locations list can contain a large number of stale 
entries causing
    +            // a large number of retries that may take a significant 
amount of time
    +            // To get rid of these stale entries we refresh the block 
locations
    +            // after a certain number of fetch failures
    +            if (dynamicAllocationEnabled && numFetchFailures >= 
maxFailuresBeforeLocationRefresh) {
    --- End diff --
    
    we can set the threshold equal to the initial number of locations, i.e.
    ```
    val initialLocations = getLocations(blockId)
    val maxFetchFailures = initialLocations.size
    val locationIter = initialLocations.iterator
    while(locationIter.hasNext) { ... }
    ```
    Essentially this says that we never traverse more executors than we started 
with. It's like saying we never do more work than we would have if we had not 
refreshed the locations.
    
    Regardless of the actual threshold we choose I do feel strongly that we 
need to break out of this loop at some point. Infinite loops are one of the 
hardest things to debug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to