Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/3868#issuecomment-72101801
1. The merge window for features closes this weekend. However this is a bug
fix so we can make it to early next week as well. But the change has to be
surgical.
2. Default timeout - if batch sizes are less than 1 second, then i see your
point. Ideally, the receivers should shutdown immediately, and stop receiving
data, so only few batches should be generated and executed after the graceful
shutdown command has been given. In practice that is probably a wrong
assumption. However, this is also confusing if we wait for 100x batches. Most
people run ~10s batches, which means the system will wait for 1000 seconds.
That's kind a confusing as well if the system is stuck for a while even with
default configuration. So I guess the most desirable thing to do that balances
both short and large batches is.
1. Default timeout max(10x batch size, 1 minute)
2. Log warning if there is a timeout.
3. Expose the timeout configuration
For 3, i donot want to add another configuration parameter to stop(), there
are already a large combination of optional parameters in stop() and each
addition gets harder to maintain in the Java and Python API. I want to just
expose the SparkConf parameter.
Regarding this patch, I agree that the change is probably not much, but
since this behavior of the code is not very tested, I am unsure of doing this
so late in the release cycle for 1.3. So for this release, it would be great if
we can do a surgical fix that solves the bug and expose the timeout. And have a
separate PR that cleans this up with refactoring and proper tests testing all
aspects of the behavior.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]