Hi, Avi
    I think the "*Checkpoint failed: The assigned slot
container_e02_1550091678485_0001_01_000023_7 was removed"*(this may be a
container failure or something else, could double check the taskamanger log
for more information)and *"**Checkpoint failed: Checkpoint Coordinator is
suspending" *are not the root cause, could you please share the jobmanager
log

    Whether the consumer consumes messages from that savepoint after
recovering from the old state is controlled by the consumer, restoring just
restore the offset if we snapshot it out when savepoint.
Best,
Congxian


Avi Levi <avi.l...@bluevoyant.com> 于2019年2月14日周四 上午8:20写道:

> Hi ,
> Any help figuring this will be highly appreciated. we are running on GC ,
> after uploading new jar with old savepoint (taken day before) some of our
> checkpoints are fails on "*Checkpoint failed: The assigned slot
> container_e02_1550091678485_0001_01_000023_7 was removed*." what is the
> reason for that ? some used to fail on timeout, but after I increased it to
> 15 min, Than some crashed on "*Checkpoint failed: Checkpoint Coordinator
> is suspending"*.  what can cause that and how to solve it ?
>
> another question - recovering old state will case that the consumer will
> consume messages from that savepoint ?
>
> regards
> Avi
>
>
>

Reply via email to