That sounds odd. Is it intermittent, or always reproducible if you starts with same checkpoint? What's the version of Spark?
On Fri, Apr 17, 2020 at 6:17 AM Ruijing Li <liruijin...@gmail.com> wrote: > Hi all, > > I have a question on how structured streaming does checkpointing. I’m > noticing that spark is not reading from the max / latest offset it’s seen. > For example, in HDFS, I see it stored offset file 30 which contains > partition: offset {1: 2000} > > But instead after stopping the job and restarting it, I see it instead > reads from offset file 9 which contains {1:1000} > > Can someone explain why spark doesn’t take the max offset? > > Thanks. > -- > Cheers, > Ruijing Li >