[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

tdas Mon, 17 Aug 2015 19:04:32 -0700

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/8237#issuecomment-132033574
  
    We have rejected such ideas before because not generating an RDD in a batch 
will actually cause problems in semantics of the downstream operations. For 
example, if you are doing updateStateByKey with the kafka stream, it is 
expected that the updatefunction will be called for each key in each batch 
interval. If there is no RDD generated, then its not clear what the semantics 
become. So I am against making this change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: KafKaDirectDstream should filter empty partiti...

Reply via email to