Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/3868#issuecomment-68670827
  
    I think I get the idea at a high-level what this patch is trying to do. The 
key change essentially is in the `ReceiverLauncher` - the `stop` is made to use 
the config `spark.streaming.gracefulStopTimeout` for tunability. However, this 
needs to be done carefully such that the behavior of the system is predictable. 
    
    The existing code is that if the timeout is set to 10 seconds, the 
context.stop will exit after 10 seconds. So that timeout applies across all the 
different waits (wait for receiver stop, wait for job generation, wait for job 
completion). This is makes it easy for a developer to understand what is going 
to happen if he/she sets the configuration to 60 seconds. 
    
    The current patch changes this to use the timeout at every level. The 
problem with that approach is that if the developer sets the timeout as 60 
seconds, the system may take upto 60 * 3 = 180 seconds to stop gracefully. 
That's confusing. 
    
    So we need to wait intelligently such that max time the whole system waits 
is predictable and easy to understand. The current code already does that, but 
only partially as the receiver-stop-wait is fixed to 10 seconds and is not 
configurable. Maybe there is a better way of refactoring the code to achieve 
this, and the current patch is not the right way because of aforementioned 
reasons. 
    
    Here are two approaches. 
    
    1.  if processAllReceivedData = true, then `JobSchduler.stop()` calls 
`ReceiverTracker.stop()` and `JobGenerator.stop()` with timeouts which will 
depend on the how much time has passed since `JobScheduler.stop()` was called. 
The overall timeout is the configurable parameter, and `ReceiverTracker`, 
`ReceiverLauncher`, and `JobGenerator` deal without whatever stop timeout has 
been given to them. 
    
    2. The whole sequence of `stop()`s is evaluated in a `Future` (with its own 
execution context, not the global one), and the configurable timeout is a timed 
wait on hte future. This may simplify code, but introduces another thread in 
the system. Probably okay to do so. 
    
    I suggest we try the second approach. What do you think, @cleaton 
@JoshRosen ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to