Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/3868#issuecomment-71905464
  
    
    
    
    This patch is definitely heading in the right direction. But as I was 
reading it, I realize that it affects the whole shutdown behavior of all the 
components pretty heavily. Since the whole purpose of this PR was to fix the 
bug that graceful shutdown was not waiting receivers to stop, I think it is 
better to split this attempt into two parts. 
    
    1. PR1 - Fixes the bugs with very surgical local changes. This probably 
should touch only ReceiverTracker. This is will be small enough to include in 
Spark 1.3 which is very soon.
    2. PR2 - Refactors the code like this PR does and makes sure that the 
overall behavior is what is desired. 
    
    For PR2, I was thinking what is the desired behavior. This was not at all 
documented in the code, so I thought of first documenting this. Then verifying 
whether any PR satisfies the desired behavior will become easy. Here are what I 
think is the desired behavior.
    
    Overall, ssc.stop(stopGracefully = false) should allow all the internal 
components enough time to stop immediately but cleanly so that the all 
resources of each component are cleared. If this requires taking some minimum 
amount of time, so be it. But each component is responsible for keep this time 
small. For ssc.stop(stopGracefully = true), the overall system should use 
whatever the configured timeout is. But here too each component should ensure 
that all resources are cleared when the stopping thread is interrupted after 
timeout. So overall the jobScheduler.stop(graceful = true) should do something 
like 
    
    ```
    if (graceful) {
        // call stop(graceful = true) on all components inside future, and wait 
for future with timeout
        // if timeout occurred (so graceful stop was not completed), call 
stop(graceful = false) on all components to clean all resources
    } else {
        // call stop(graceful = false) on all components
    }
    ```
    
    Anyways, to summarize, it would great if you can submit a more surgical PR1 
that affects only receiversTracker (and units to test for this), and then make 
this current PR refactor stuff to implement this behavior. I would love to have 
the surgical fix as part of Spark 1.3.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to