Understanding spark structured streaming checkpointing system

Ruijing Li Thu, 16 Apr 2020 14:17:21 -0700

Hi all,

I have a question on how structured streaming does checkpointing. I’m
noticing that spark is not reading from the max / latest offset it’s seen.
For example, in HDFS, I see it stored offset file 30 which contains
partition: offset {1: 2000}


But instead after stopping the job and restarting it, I see it instead
reads from offset file 9 which contains {1:1000}

Can someone explain why spark doesn’t take the max offset?

Thanks.
-- 
Cheers,
Ruijing Li

Understanding spark structured streaming checkpointing system

Reply via email to