Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/15102
> You should not be assuming 0 for a starting offset for partitions you've
just learned about. You should be asking the underlying driver consumer what
its position is.
Yes, there are two approaches for getting new partitions' offsets:
1. From a user perspective, if I set âauto.offset.resetâ to latest, I
would like to process new data after the query starts. So we should start from
the earliest offset for a new partition.
2. Follow `auto.offset.reset` as what you did in your fork.
However, option 2 makes the query result indeterminate when
`auto.offset.reset` is latest, depends on how the query runs. E.g., if you add
new partitions and push new data very quickly, the new data may be lost; if you
add new partitions and push new data when the app is being recovered, the new
data may be lost as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]