Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15102
We aren't comparing ordering of offsets across partitions, and I don't
think that was ever in consideration. At this point the most likely
candidate for the global ordering is implicit in the single driver thread
that is asking for getOffset
On Sep 23, 2016 12:41 AM, "Jay White Bear" <[email protected]> wrote:
> Just curious looking at this, if you are comparing "sequential" offsets
> across partitions a rebalance would definitely affect this and, unless
> something has changed, it probably not a good idea to compare offsets from
> kafka across partitions. You could simply add an id/timestamp to the
> producer and send it with the message rather than using this methodology
or
> if you must use offset query the broker for the full list and compare what
> you consumed to that list (small increase in latency btwn consumption and
> processing). This is from the Kafka paper, which makes me question your
> scheme: "...Note that our message ids are increasing but not consecutive.
> To compute the id of the next message, we have to add the length of the
> current message to its id." This means simply comparing which offsets are
> larger will not necessarily yield you the most recent message across
> partitions and definitely won't hold in a rebalance during which time some
> broker logs will be on hold and not consumed. In my own implementation,
the
> offsets are great for message guarantees (eg delivery/consumption checks),
> because the broker has a full ordered list, but not for cross partition
> ordering.
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15102#issuecomment-249107463>, or
mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AAGAB8MjSGueEHbEv3gXHkFiV5_CfsTqks5qs2Z2gaJpZM4J9QvR>
> .
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]