Hi,

Is there any reason why you can't register the nodes with Consul from the
code of the IP finder? This is what other cloud based IP finders (aws,
google compute engine, jdbc, etc) do in Ignite. The point here is that if
the ip finder returns non null list of adresses then the nodes should be
reachable.

As for the retry logic, don't have access to the code right now, but
tcpdiscoveryspi.jointimeout might work or, again, that logic can be
embedded in the ip finder.

Denis

On Friday, May 19, 2017, Chris Berry <[email protected]> wrote:

> Hi,
>
> I have a chicken-and-egg problem.
> I am trying to create a ConsulIpFinder – which uses our Consul-based
> service
> discovery under the covers.
>
> (I asked about this without luck here:
> http://apache-ignite-users.70518.x6.nabble.com/ConsulIpFinder-
> TcpDiscoveryIpFinder-issue-td12974.html
> )
>
> My problem is this.
> If I start 1 Node, then wait until it is alive, and then start N Nodes, I
> never have any issues getting all of my Nodes to find each other in the
> Grid. 100% success.
> But, if I try to start all N Nodes simultaneously, I get most Nodes
> starting
> up thinking they are isolated from each other. Almost 100% failure.
> (Note; we use Mesos/Marathon to manage our Nodes, and would like to be able
> to start them all simultaneously. We really do not want a special, manual
> process)
>
> The chicken-and-egg problem is because:
> 1) I must start Ignite as I am starting up the Node, which means that it
> will try to discover the Grid.
> 2) But, until a Node is started, its Consul Health Check will fail, and
> thus, the Node will not appear in Consul, and therefore not in my IpFinder.
> Thus, Ignite cannot discover all of the other Nodes in the Grid because
> they
> are not yet available to the IpFinder.
> In general, as they all start, they are unaware of each other.
>
> What I need is either
> 1) A way to defer discovery until I can start it explicitly – later in the
> lifecycle. Yet, start Ignite enough that I can create Caches, etc.
> 2) Or a way to force a Node to retry joining the Grid.
>
> Because, even though my IpFinder eventually has all of the Nodes in the
> Grid
> in its getRegisteredAddresses().
> It never attempts to reregister itself with the Grid.
>
> I hope this question makes sense.
>
> I would love to manage this in my code.
> Because it appears my only hope right now is to create a nasty hack that
> staggers the start in my startup script (with a random sleep),
> and that seems like a terrible option.
>
> Every IpFinder I have read assume that somehow, magically I have a
> predefined list of IP:Ports.
> But that is difficult in the ephemeral world of the Cloud.
>
> Thanks,
> -- Chris
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Reply via email to