Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4984#issuecomment-117802931
  
    @dragos Thanks for all your work and patience! I took a pass over it and I 
think the high level approach is reasonable. Most of my comments are relatively 
minor, but I have two bigger comments which I will describe here:
    
    - How does dynamic allocation work with max cores? It won't ever scale up 
beyond max cores, which may not be what the user expects. There are two things 
we can do about this: (1) Nothing - maybe this isn't a problem after all; if 
the user sets max cores we should trust respect it as a hard cap, or (2) Make 
dynamic allocation and max cores mutually exclusive. What are your thoughts, 
@dragos @tnachen?
    
    - The clean up logic here is only best-effort; if the driver is killed 
forcefully we won't clean up the shuffle files. By itself, this property is not 
much of a problem. However, a better place to detect when an application exits 
would be in the cluster manager itself (e.g. the Mesos master probably knows 
this, right?) My concern is that we'll end up with two separate code paths in 
the future where only one of them is needed; handling this in the cluster 
manager is strictly more reliable than doing so in the driver.
    
    By the way, if you prefer, I'm completely open to separating the clean up 
shuffle files logic to a separate patch. This might allow us to merge the 
dynamic allocation part sooner. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to