Hi all,

I have a question on how structured streaming does checkpointing. I’m
noticing that spark is not reading from the max / latest offset it’s seen.
For example, in HDFS, I see it stored offset file 30 which contains
partition: offset {1: 2000}

But instead after stopping the job and restarting it, I see it instead
reads from offset file 9 which contains {1:1000}

Can someone explain why spark doesn’t take the max offset?

Thanks.
-- 
Cheers,
Ruijing Li

Reply via email to