Github user koeninger commented on the issue:
My fork is not following auto.offset.reset, it's following what the
(potentially user-provided) consumer does when it sees a new partition.
Maybe that's auto.offset.reset, maybe it's something else.
Either way, who are you to presume that a user doesn't know what she is
doing when she configured a consumer to start at a particular position for
an added partition? You have no guarantee that offset 0L even exists at
On Tue, Sep 20, 2016 at 2:52 PM, Shixiong Zhu <notificati...@github.com>
> You should not be assuming 0 for a starting offset for partitions you've
> just learned about. You should be asking the underlying driver consumer
> what its position is.
> Yes, there are two approaches for getting new partitions' offsets:
> 1. From a user perspective, if I set âauto.offset.resetâ to
> would like to process new data after the query starts. So we should
> from the earliest offset for a new partition.
> 2. Follow auto.offset.reset as what you did in your fork.
> However, option 2 makes the query result indeterminate when
> auto.offset.reset is latest, depends on how the query runs. E.g., if you
> add new partitions and push new data very quickly, the new data may be
> lost; if you add new partitions and push new data when the app is being
> recovered, the new data may be lost as well.
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15102#issuecomment-248414530>, or
> the thread
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org