Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/3868#issuecomment-68670827
I think I get the idea at a high-level what this patch is trying to do. The
key change essentially is in the `ReceiverLauncher` - the `stop` is made to use
the config `spark.streaming.gracefulStopTimeout` for tunability. However, this
needs to be done carefully such that the behavior of the system is predictable.
The existing code is that if the timeout is set to 10 seconds, the
context.stop will exit after 10 seconds. So that timeout applies across all the
different waits (wait for receiver stop, wait for job generation, wait for job
completion). This is makes it easy for a developer to understand what is going
to happen if he/she sets the configuration to 60 seconds.
The current patch changes this to use the timeout at every level. The
problem with that approach is that if the developer sets the timeout as 60
seconds, the system may take upto 60 * 3 = 180 seconds to stop gracefully.
That's confusing.
So we need to wait intelligently such that max time the whole system waits
is predictable and easy to understand. The current code already does that, but
only partially as the receiver-stop-wait is fixed to 10 seconds and is not
configurable. Maybe there is a better way of refactoring the code to achieve
this, and the current patch is not the right way because of aforementioned
reasons.
Here are two approaches.
1. if processAllReceivedData = true, then `JobSchduler.stop()` calls
`ReceiverTracker.stop()` and `JobGenerator.stop()` with timeouts which will
depend on the how much time has passed since `JobScheduler.stop()` was called.
The overall timeout is the configurable parameter, and `ReceiverTracker`,
`ReceiverLauncher`, and `JobGenerator` deal without whatever stop timeout has
been given to them.
2. The whole sequence of `stop()`s is evaluated in a `Future` (with its own
execution context, not the global one), and the configurable timeout is a timed
wait on hte future. This may simplify code, but introduces another thread in
the system. Probably okay to do so.
I suggest we try the second approach. What do you think, @cleaton
@JoshRosen ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]