Hi, I have a chicken-and-egg problem. I am trying to create a ConsulIpFinder – which uses our Consul-based service discovery under the covers.
(I asked about this without luck here: http://apache-ignite-users.70518.x6.nabble.com/ConsulIpFinder-TcpDiscoveryIpFinder-issue-td12974.html ) My problem is this. If I start 1 Node, then wait until it is alive, and then start N Nodes, I never have any issues getting all of my Nodes to find each other in the Grid. 100% success. But, if I try to start all N Nodes simultaneously, I get most Nodes starting up thinking they are isolated from each other. Almost 100% failure. (Note; we use Mesos/Marathon to manage our Nodes, and would like to be able to start them all simultaneously. We really do not want a special, manual process) The chicken-and-egg problem is because: 1) I must start Ignite as I am starting up the Node, which means that it will try to discover the Grid. 2) But, until a Node is started, its Consul Health Check will fail, and thus, the Node will not appear in Consul, and therefore not in my IpFinder. Thus, Ignite cannot discover all of the other Nodes in the Grid because they are not yet available to the IpFinder. In general, as they all start, they are unaware of each other. What I need is either 1) A way to defer discovery until I can start it explicitly – later in the lifecycle. Yet, start Ignite enough that I can create Caches, etc. 2) Or a way to force a Node to retry joining the Grid. Because, even though my IpFinder eventually has all of the Nodes in the Grid in its getRegisteredAddresses(). It never attempts to reregister itself with the Grid. I hope this question makes sense. I would love to manage this in my code. Because it appears my only hope right now is to create a nasty hack that staggers the start in my startup script (with a random sleep), and that seems like a terrible option. Every IpFinder I have read assume that somehow, magically I have a predefined list of IP:Ports. But that is difficult in the ephemeral world of the Cloud. Thanks, -- Chris -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
