I've been monitoring the task and checkpoint 1 never gets deleted. Right
now we have:

chk-1  chk-1222  chk-326  chk-329  chk-357  chk-358  chk-8945  chk-8999
chk-9525  chk-9788  chk-9789  chk-9790  chk-9791

I made the task fail and it recovered without problems so for now I would
say that the problem was with the distributed system or that somehow the
chk-1 folder got deleted by something external to flink. If I see the
problem again I will try to get more information.

Thanks,

Gerard

On Tue, Nov 21, 2017 at 4:27 PM, Stefan Richter <s.rich...@data-artisans.com
> wrote:

> Ok, thanks for trying to reproduce this. If possible, could you also
> activate trace-level logging for class 
> org.apache.flink.runtime.state.SharedStateRegistry?
> In case the problem occurs, this would greatly help to understand what was
> going on.
>
> > Am 21.11.2017 um 15:16 schrieb gerardg <ger...@talaia.io>:
> >
> >> where exactly did you read many times that incremental checkpoints
> cannot
> > reference files from previous
> >> checkpoints, because we would have to correct that information. In fact,
> >> this is how incremental checkpoints work.
> >
> > My fault, I read it in some other posts in the mailing list but now that
> I
> > read it carefully it meant savepoints not checkpoints.
> >
> >> Now for this case, I would consider it extremely unlikely that a
> >> checkpoint 1620 would still reference a checkpoint 1,
> >> in particular if the files for that checkpoint are already deleted,
> which
> >> should only happen if it is no longer
> >> referenced. Which version of Flink are you using and what is your
> >> distributed filesystem? Is there any way to
> >> reproduce the problem?
> >
> > We are using Flink version 1.3.2 and GlusterFS.  There are usually a few
> > checkpoints around at the same time, for example right now:
> >
> > chk-1  chk-26  chk-27  chk-28  chk-29  chk-30  chk-31
> >
> > I'm not sure how to reproduce the problem but I'll monitor the folder to
> see
> > when chk-1 gets deleted and try to make the task fail when that happens.
> >
> > Gerard
> >
> > Gerard
> >
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>
>

Reply via email to