I don't know anything about Consul but the prospect of having other options beside Zookeeper is very interesting. It's rather surprising how little you had to modify existing classes to get this to work.

It may take a bit until someone provides proper feedback as the community is currently prepping 2 releases (1.4.1 and 1.5), please don't be discouraged by this.

I saw that your branch was based on the 1.4 version. In 1.5 we reworked the distributed architecture of Flink (in an initiative commonly referred to as FLIP-6) which may affect your work.

2 things to note from my side:
It would also be helpful if you could explain the differences between ZK and Consul and how they stack up in terms of guarantees etc. . How did you test your solution so far? (Like how long was a cluster running, what failure scenarios)

On 13.02.2018 21:38, Krzysztof Białek wrote:
I'd like to get your opinion about this idea. I found related JIRA issueFLINK-2366, but it seems to be dead. To attract your attention I copy my comment here.

As an experiment I've implemented Flink HA on top of Consul. The implementation is working fine in the "lab" but is not battle tested yet. The source code is available athttps://github.com/kbialek/flink/tree/feature/consul <https://github.com/kbialek/flink/tree/feature/consul>(flink-runtime package org.apache.flink.runtime.consul)

Why?. Generally I'd like to keep as less moving parts as possible. We do not have Zookeeper running, but Consul is already in place. And in the end freedom of choice is a good thing.

It would be great to see built-in Consul support in Flink someday, but if it is not expected then I suggest a little refactoring to open possibility to configure HighAvailabilityServicesFactory. As far as I can see this should be enough to inject any HA implementation.


Reply via email to