Hi Yakov, What you are asking for is difficult.
As I've explained; Ignite Nodes will not show up in Consul until they are capable of responding to their Health Checks. This means they must be initialized and capable of responding to a Request for a Heath Check with "OK". Currently, this means that Ignite must be started -- so that we can configure Caches & DataStreamers and wire them with all the other Dependency Injected Beans in the system. So that teh entire system can be configured and respond to Requests. So we have a chicken and egg situation. What is needed is to allow us to either: 1) Postpone a Node's attempt to join the Cluster until it is alive and well. In other words, to wait for a lifecycle event that the Node is started, and to fire an explicit call to "join()". 2) To allow a Node to explicitly attempt to re-join the Cluster. Again, this would be fired by a lifecycle event. I have read all the existing implementations, and they all seem to rely on the fact that somehow, magically a Node will know the other Nodes in the Cluster. I suspect this is because they use some static List. But clearly, if I start 10 Nodes simultaneously in the Cloud, this is difficult. Nodes will not have IPs and in Mesos/Marathon; Ports, until they are started. The point is that Consul is controlling the List of Cluster Nodes, not Ignite. Nodes register during startup (in startup scripts, using Container Pilot), but they are not "seen" until they pass their Health Checks. Conversely, as Nodes come and go, Consul is aware, and will always return the current known Cluster List. You can see this reflected in the final form of the ConsulIpFinder I posted above. (Not the one you quoted) I modeled that Impl after this code: https://github.com/apache/ignite/blob/master/modules/cloud/src/main/java/org/apache/ignite/spi/discovery/tcp/ipfinder/cloud/TcpDiscoveryCloudIpFinder.java The problem is simply that we have no programmatic control over when Join() is called. NOTE: if there is no way this can get corrected. I will have to somehow rewrite my app to defer the Ignite.start() until I get a "Server is started" event. Which implies that I will have to also "lazy init" all of my caches, etc. This a pretty large refactoring. But if I must, I will do it. Although, I must say, I suspect that many others will find themselves in the same boat as I... Thanks much, -- Chris -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Race-Condition-at-Grid-Startup-tp13038p13101.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
