[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

zsxwing Tue, 20 Sep 2016 12:52:23 -0700

Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    > You should not be assuming 0 for a starting offset for partitions you've 
just learned about. You should be asking the underlying driver consumer what 
its position is.
    
    Yes, there are two approaches for getting new partitions' offsets:
    
    1. From a user perspective, if I set âauto.offset.resetâ to latest, I 
would like to process new data after the query starts. So we should start from 
the earliest offset for a new partition.
    2. Follow `auto.offset.reset` as what you did in your fork.
    
    However, option 2 makes the query result indeterminate when 
`auto.offset.reset` is latest, depends on how the query runs. E.g., if you add 
new partitions and push new data very quickly, the new data may be lost; if you 
add new partitions and push new data when the app is being recovered, the new 
data may be lost as well.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

Reply via email to