Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/21917
Example report of skipped offsets in a non-compacted non-transactional
situation
http://mail-archives.apache.org/mod_mbox/kafka-users/201801.mbox/%3ccakwx9vxc1cdosqwwwjk3qmyy3svvtmh+rjdrjyvsbejsds8...@mail.gmail.com%3EFo
I asked on the kafka list about ways to tell if an offset is a
transactional marker. I also asked about endOffset alternatives, although
I think that doesn't totally solve the problem (for instance, in cases
where the batch size has been rate limited)
On Mon, Aug 6, 2018 at 2:57 AM, Quentin Ambard <[email protected]>
wrote:
> By failed, you mean returned an empty collection after timing out, even
> though records should be available? You don't. You also don't know that it
> isn't just lost because kafka skipped a message. AFAIK from the
information
> you have from a kafka consumer, once you start allowing gaps in offsets,
> you don't know.
>
> Ok that's interesting, my understanding was that if you successfully poll
> and get results you are 100% sure that you don't lose anything. Do you
have
> more details on that? Why would kafka skip a record while consuming?
>
> Have you tested comparing the results of consumer.endOffsets for consumers
> with different isolation levels?
>
> endOffsets returns the last offset (same as seekToEnd). But you're right
> that the easiest solution for us would be to have something like
> seekToLastRecord method instead. Maybe something we could also ask ?
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21917#issuecomment-410620996>, or
mute
> the thread
>
<https://github.com/notifications/unsubscribe-auth/AAGAB2FVhHp_76l0WnRg_2WPgzSx1LlSks5uN_bxgaJpZM4VmlWm>
> .
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]