without --allowNonRestoredState, on a suspend/resume we do see the length file along with the finalized file ( finalized during resume )
-rw-r--r-- 3 root hadoop 10 2018-02-09 13:57 /vishal/sessionscid/dt=2018-02-09/_part-0-28.valid-length that does makes much more sense. I guess we should document --allowNonRestoredState better ? It seems it actually drops state ? On Fri, Feb 9, 2018 at 1:37 PM, Vishal Santoshi <vishal.santo...@gmail.com> wrote: > This is 1.4 BTW. I am not sure that I am reading this correctly but the > lifecycle of cancel/resume is 2 steps > > > > 1. Cancel job with SP > > > closeCurrentPartFile > > https://github.com/apache/flink/blob/master/flink- > connectors/flink-connector-filesystem/src/main/java/org/ > apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L549 > > is called from close() > > > https://github.com/apache/flink/blob/master/flink- > connectors/flink-connector-filesystem/src/main/java/org/ > apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L416 > > > and that moves files to pending state. That I would presume is called > when one does a cancel. > > > > 2. The restore on resume > > https://github.com/apache/flink/blob/master/flink- > connectors/flink-connector-filesystem/src/main/java/org/ > apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L369 > > calls > > handleRestoredBucketState > > https://github.com/apache/flink/blob/master/flink- > connectors/flink-connector-filesystem/src/main/java/org/ > apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L704 > > clears the pending files from state without finalizing them? > > > > That does not seem to be right. I must be reading the code totally wrong ? > > I am not sure also whether --allowNonRestoredState is skipping getting > the state . At least https://ci.apache.org/projects/flink/flink-docs- > release-1.4/ops/cli.html#savepoints is not exactly clear what it does if > we add an operator ( GDF I think will add a new operator in the DAG without > state even if stateful, in my case the Map operator is not even stateful ) > > > Thanks and please bear with me if this is all something pretty simple. > > Vishal > > > > > > > > > > > > > On Fri, Feb 9, 2018 at 11:54 AM, Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> What should be the behavior of BucketingSink vis a vis state ( pending , >> inprogess and finalization ) when we suspend and resume ? >> >> So I did this >> >> * I had a pipe writing to hdfs suspend and resume using >> >> --allowNonRestoredState as in I had added a harmless MapOperator ( >> stateless ). >> >> >> * I see that a file on hdfs, the file being written to ( before the >> cancel with save point ) go into a pending state _part-0-21.pending >> >> >> * I see a new file being written to in the resumed pipe >> _part-0-22.in-progress. >> >> >> What I do not see is the file in _part-0-21.pending being finalized ( >> as in renamed to a just part-0-21. I would have assumed that would be the >> case in this controlled suspend/resume circumstance. Further it is a rename >> and hdfs mv is not an expensive operation. >> >> >> >> Am I understanding the process correct and it yes any pointers ? >> >> >> Regards, >> >> >> Vishal >> > >