Hello,
I don't know anything about Consul but the prospect of having other
options beside Zookeeper is very interesting. It's rather surprising how
little you had to modify existing classes to get this to work.
It may take a bit until someone provides proper feedback as the
community is currently prepping 2 releases (1.4.1 and 1.5), please don't
be discouraged by this.
I saw that your branch was based on the 1.4 version. In 1.5 we reworked
the distributed architecture of Flink (in an initiative commonly
referred to as FLIP-6) which may affect your work.
2 things to note from my side:
It would also be helpful if you could explain the differences between ZK
and Consul and how they stack up in terms of guarantees etc. .
How did you test your solution so far? (Like how long was a cluster
running, what failure scenarios)
On 13.02.2018 21:38, Krzysztof Białek wrote:
I'd like to get your opinion about this idea. I found related JIRA
issueFLINK-2366, but it seems to be dead. To attract your attention I
copy my comment here.
As an experiment I've implemented Flink HA on top of Consul. The
implementation is working fine in the "lab" but is not battle tested
yet. The source code is available
athttps://github.com/kbialek/flink/tree/feature/consul
<https://github.com/kbialek/flink/tree/feature/consul>(flink-runtime
package org.apache.flink.runtime.consul)
Why?. Generally I'd like to keep as less moving parts as possible. We
do not have Zookeeper running, but Consul is already in place. And in
the end freedom of choice is a good thing.
It would be great to see built-in Consul support in Flink someday, but
if it is not expected then I suggest a little refactoring to open
possibility to configure HighAvailabilityServicesFactory. As far as I
can see this should be enough to inject any HA implementation.
Regards,
Krzysztof