Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3785#issuecomment-68292416
  
    On the surface, this seems like an okay change.  I wonder whether this 
retry logic could have unexpected consequences.  Let me try to reason it out:
    
    - `askTracker` is only called with `GetMapOutputStatuses`.
    - In the master actor, it calls `getSerializedMapOutputStatuses`.  This 
method never throws exceptions: if a shuffle is missing, then it just stores an 
empty array and serializes it.
    - It's possible that the serialized map statuses could exceed the Akka 
frame size (although extremely unlikely and perhaps impossible with the new 
output status compression techniques).  In this case, though, the master would 
throw an exception and fail to send a reply back to the asker.  In this case, 
with this patch we'd end up performing a bunch of retries for an operation that 
will ultimately fail, so we'll take longer to detect a failure.
    
    In the common cases, though, this seems fine, even if the map output 
statuses are missing (since it won't introduce a bunch of futile retries).  
Therefore, I think we should pull this in; I don't know if this fixes an actual 
bug, but it seems like it could make things more robust.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to