I've been monitoring the task and checkpoint 1 never gets deleted. Right now we have:
chk-1 chk-1222 chk-326 chk-329 chk-357 chk-358 chk-8945 chk-8999 chk-9525 chk-9788 chk-9789 chk-9790 chk-9791 I made the task fail and it recovered without problems so for now I would say that the problem was with the distributed system or that somehow the chk-1 folder got deleted by something external to flink. If I see the problem again I will try to get more information. Thanks, Gerard On Tue, Nov 21, 2017 at 4:27 PM, Stefan Richter <s.rich...@data-artisans.com > wrote: > Ok, thanks for trying to reproduce this. If possible, could you also > activate trace-level logging for class > org.apache.flink.runtime.state.SharedStateRegistry? > In case the problem occurs, this would greatly help to understand what was > going on. > > > Am 21.11.2017 um 15:16 schrieb gerardg <ger...@talaia.io>: > > > >> where exactly did you read many times that incremental checkpoints > cannot > > reference files from previous > >> checkpoints, because we would have to correct that information. In fact, > >> this is how incremental checkpoints work. > > > > My fault, I read it in some other posts in the mailing list but now that > I > > read it carefully it meant savepoints not checkpoints. > > > >> Now for this case, I would consider it extremely unlikely that a > >> checkpoint 1620 would still reference a checkpoint 1, > >> in particular if the files for that checkpoint are already deleted, > which > >> should only happen if it is no longer > >> referenced. Which version of Flink are you using and what is your > >> distributed filesystem? Is there any way to > >> reproduce the problem? > > > > We are using Flink version 1.3.2 and GlusterFS. There are usually a few > > checkpoints around at the same time, for example right now: > > > > chk-1 chk-26 chk-27 chk-28 chk-29 chk-30 chk-31 > > > > I'm not sure how to reproduce the problem but I'll monitor the folder to > see > > when chk-1 gets deleted and try to make the task fail when that happens. > > > > Gerard > > > > Gerard > > > > > > > > > > -- > > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ > >