Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/15102
  
    > You should not be assuming 0 for a starting offset for partitions you've 
just learned about. You should be asking the underlying driver consumer what 
its position is.
    
    Yes, there are two approaches for getting new partitions' offsets:
    
    1. From a user perspective, if I set “auto.offset.reset” to latest, I 
would like to process new data after the query starts. So we should start from 
the earliest offset for a new partition.
    2. Follow `auto.offset.reset` as what you did in your fork.
    
    However, option 2 makes the query result indeterminate when 
`auto.offset.reset` is latest, depends on how the query runs. E.g., if you add 
new partitions and push new data very quickly, the new data may be lost; if you 
add new partitions and push new data when the app is being recovered, the new 
data may be lost as well.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to