Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/4805#issuecomment-77882744
As it stands now, no offsets are stored by spark unless you're
checkpointing. Does it really make sense to have an option to
automatically store offsets in Kafka, but not store offsets in the
checkpoint? Failure recovery in that case depends on user provided
starting offsets (or starting at the beginning / end of the log). If
someone has the sophistication to get offsets from kafka in order to
provide them as a starting point, they probably have the sophistication to
save offsets to kafka themselves in the job.
If offsets are only being sent to Kafka when they are also stored in the
checkpoint, then does sending offsets to kafka in compute() also make
sense? Yes, you can lag behind, but those offsets are in the queue to get
processed at least once.
I'm not 100% sure on the answer to this, its more a question of desired
behavior, but that's why I brought it up.
On Mon, Mar 9, 2015 at 12:14 AM, Saisai Shao <[email protected]>
wrote:
> Hi @koeninger <https://github.com/koeninger> , would you please review
> this again? Thanks a lot and appreciate your time.
>
> Here I still keep using the HashMap for Time -> offset relation mapping,
> since checkpoint data will only be updated when checkpoint is enabled, I
> hope this could also be worked even without checkpoint enabled.
>
> And I still use StreamingListener to update the offset, the reason is
> mentioned before.
>
> Besides I updated the configuration name, not sure is it suitable.
>
> Thanks a lot.
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/4805#issuecomment-77801344>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]