Re: Understanding spark structured streaming checkpointing system

Jungtaek Lim Sun, 19 Apr 2020 15:38:23 -0700

That sounds odd. Is it intermittent, or always reproducible if you starts
with same checkpoint? What's the version of Spark?


On Fri, Apr 17, 2020 at 6:17 AM Ruijing Li <liruijin...@gmail.com> wrote:

> Hi all,
>
> I have a question on how structured streaming does checkpointing. I’m
> noticing that spark is not reading from the max / latest offset it’s seen.
> For example, in HDFS, I see it stored offset file 30 which contains
> partition: offset {1: 2000}
>
> But instead after stopping the job and restarting it, I see it instead
> reads from offset file 9 which contains {1:1000}
>
> Can someone explain why spark doesn’t take the max offset?
>
> Thanks.
> --
> Cheers,
> Ruijing Li
>

Re: Understanding spark structured streaming checkpointing system

Reply via email to