Hi Vino and TIll, That's great news. Thank you!
Cheers, Tzanko On Thu, Sep 20, 2018 at 11:43 AM vino yang <yanghua1...@gmail.com> wrote: > Hi all, > > Oh, I took this ticket, will fix it as soon as possible. > > Thanks, vino. > > Till Rohrmann <trohrm...@apache.org> 于2018年9月20日周四 下午4:35写道: > >> Hi Tzanko, >> >> in order to make the container entrypoint properly work with HA, we need >> to fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291). >> At the moment, we generate a new JobID for every restart of the cluster >> entrypoint container. Due to that the system cannot find the existing >> checkpoints. >> >> Fixing the JobID is not a big deal and it should be fixed with the next >> bug fix release. >> >> Cheers, >> Till >> >> On Thu, Sep 20, 2018 at 10:12 AM vino yang <yanghua1...@gmail.com> wrote: >> >>> Hi Tzanko, >>> >>> Maybe Till is more appropriate to answer this question. >>> >>> Thanks, vino. >>> >>> Tzanko Matev <tsa...@gmail.com> 于2018年9月19日周三 下午5:47写道: >>> >>>> Dear all, >>>> >>>> I am currently experimenting with a Flink 1.6.0 job cluster. The goal >>>> is to run a streaming job on K8s. Right now I am using docker-compose to >>>> experiment with the job cluster. >>>> >>>> I am trying to set-up HA with Zookeeper, but I seem to fail. I have a >>>> docker-compose file which contains the following services: >>>> - Zookeeper >>>> - Flink job manager >>>> - Flink task manager >>>> >>>> The containers are set up as per the documentation for docker-compose, >>>> but I have also set up the necessary HA settings in the conf file. However, >>>> when I kill the job manager container and start it again, the job being >>>> processed does not recover but always starts from scratch. Instead I get >>>> the following error: >>>> >>>> > ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - >>>> Could not retrieve the redirect address. >>>> > >>>> > java.util.concurrent.CompletionException: >>>> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing >>>> token not set: Ignoring message >>>> LocalFencedMessage(8c4887f5c13f6d907d82a55d97ac428f, >>>> LocalRpcInvocation(requestRestAddress(Time))) sent to >>>> akka.tcp://flink@blockprocessor-job-cluster:50000/user/dispatcher >>>> because the fencing token is null. >>>> >>>> Am I missing something? Is HA implemented for job clusters at all? >>>> >>>> Best wishes, >>>> Tzanko Matev >>>> >>>>