Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/8237#issuecomment-132033574
We have rejected such ideas before because not generating an RDD in a batch
will actually cause problems in semantics of the downstream operations. For
example, if you are doing updateStateByKey with the kafka stream, it is
expected that the updatefunction will be called for each key in each batch
interval. If there is no RDD generated, then its not clear what the semantics
become. So I am against making this change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]