Re: HA failing for 1.6.0 job cluster with docker-compose

Tzanko Matev Fri, 21 Sep 2018 04:47:02 -0700

Hi Vino and TIll,

That's great news. Thank you!


Cheers,
Tzanko



On Thu, Sep 20, 2018 at 11:43 AM vino yang <yanghua1...@gmail.com> wrote:

> Hi all,
>
> Oh, I took this ticket, will fix it as soon as possible.
>
> Thanks, vino.
>
> Till Rohrmann <trohrm...@apache.org> 于2018年9月20日周四 下午4:35写道：
>
>> Hi Tzanko,
>>
>> in order to make the container entrypoint properly work with HA, we need
>> to fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291).
>> At the moment, we generate a new JobID for every restart of the cluster
>> entrypoint container. Due to that the system cannot find the existing
>> checkpoints.
>>
>> Fixing the JobID is not a big deal and it should be fixed with the next
>> bug fix release.
>>
>> Cheers,
>> Till
>>
>> On Thu, Sep 20, 2018 at 10:12 AM vino yang <yanghua1...@gmail.com> wrote:
>>
>>> Hi Tzanko,
>>>
>>> Maybe Till is more appropriate to answer this question.
>>>
>>> Thanks, vino.
>>>
>>> Tzanko Matev <tsa...@gmail.com> 于2018年9月19日周三 下午5:47写道：
>>>
>>>> Dear all,
>>>>
>>>> I am currently experimenting with a Flink 1.6.0 job cluster. The goal
>>>> is to run a streaming job on K8s. Right now I am using docker-compose to
>>>> experiment with the job cluster.
>>>>
>>>> I am trying to set-up HA with Zookeeper, but I seem to fail. I have a
>>>> docker-compose file which contains the following services:
>>>> - Zookeeper
>>>> - Flink job manager
>>>> - Flink task manager
>>>>
>>>> The containers are set up as per the documentation for docker-compose,
>>>> but I have also set up the necessary HA settings in the conf file. However,
>>>> when I kill the job manager container and start it again, the job being
>>>> processed does not recover but always starts from scratch. Instead I get
>>>> the following error:
>>>>
>>>> > ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  -
>>>> Could not retrieve the redirect address.
>>>> >
>>>> > java.util.concurrent.CompletionException:
>>>> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing
>>>> token not set: Ignoring message
>>>> LocalFencedMessage(8c4887f5c13f6d907d82a55d97ac428f,
>>>> LocalRpcInvocation(requestRestAddress(Time))) sent to
>>>> akka.tcp://flink@blockprocessor-job-cluster:50000/user/dispatcher
>>>> because the fencing token is null.
>>>>
>>>> Am I missing something? Is HA implemented for job clusters at all?
>>>>
>>>> Best wishes,
>>>> Tzanko Matev
>>>>
>>>>

Re: HA failing for 1.6.0 job cluster with docker-compose

Reply via email to