Congxian,

We just did a test. Separating the two stateful operators from
chaining seems to have worked around the problem. The states for both
of them are successfully saved in the checkpoint.

Ning

On Tue, Apr 23, 2019 at 7:41 AM Ning Shi <nings...@gmail.com> wrote:
>
> Congxian,
>
> Thank you for creating the ticket and providing the relevant code. I’m 
> curious why you don’t think the directory collision is not a problem. What we 
> observe is that one of the operator states are not included in the checkpoint 
> and data is lost on restore. That’s a pretty serious problem especially when 
> Flink doesn’t generate any error in the log. People could be losing states 
> silently potentially.
>
> Please let me know how I can best help diagnose this issue and drive the 
> ticket forward. I’m happy to collect any relevant information.
>
> Thanks,
>
> —
> Ning
>
> > On Apr 23, 2019, at 2:54 AM, Congxian Qiu <qcx978132...@gmail.com> wrote:
> >
> > From the log message you given, the two operate share the same directory, 
> > and when snapshot, the directory will be deleted first if it 
> > exists(RocksIncrementalSnapshotStrategy#prepareLocalSnapshotDirectory).
> >
> > I did not find an issue for this problem, and I don’t thinks this is a 
> > problem of UUID generation problem, please check the path generation logic 
> > in LocalRecoveryDirectoryProviderImpl#subtaskSpecificCheckpointDirectory.
> >
> > I’ve created an issue for this problem.

Reply via email to