Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4984#issuecomment-118184097
  
    @dragos I think this is pretty close. The one outstanding issue is still 
who cleans up the shuffle files.
    
    This patch proposes to have the driver send clean up requests to all 
shuffle services when the application exits. As I mentioned earlier, there are 
no guarantees here even if we install shutdown hooks. 
    We cannot expect the driver to manage its own exit; this is really the 
responsibility of the cluster manager. My main objection is that for other 
modes like YARN and standalone mode (coming soon) we'll have two separate code 
paths to clean up the shuffle files, which leads to additional complexity.
    
    I understand that the external shuffle service is started independently of 
Mesos, so even though the Mesos master knows when the framework exits, we still 
have to pass this information on to the shuffle service. There are two 
potential ways to make this work:
    - (1) Have the shuffle service query the master for framework status 
periodically. When the framework has exited, we clean up the corresponding 
application's shuffle files. @tnachen is there enough support on the Mesos side 
to make this happen?
    - (2) Initiate a long-running connection between the shuffle service and 
each driver. When the connection is closed on the other end, the shuffle 
service knows the corresponding application has exited.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to