On Mon, Jun 29, 2015 at 3:58 AM, Jan Willem Janssen < [email protected]> wrote:
> Hi Robert, > > > On 29 Jun 2015, at 11:49, Robert M. Mather <[email protected]> > wrote: > > > > Our hosting service where the ACE server runs was down for maintenance, > so > > our clients couldn't contact the ACE server for an extended period of > time. > > I've logged in to a few client sites (we have around 100) and I'm seeing > > that the agents eventually blacklisted the server IP and stopped checking > > there for updates, even though that's the only one we have. Once the > server > > came back online, they still didn't resume sychronizing with it. Is this > > the correct behavior? Shouldn't the agent detect when the server is back > > online and connect to it again? > > > > I now see the "agent.discovery.checking" option, which I guess we should > > set to false in the future. > > IMO, this is a bug: it makes no sense to blacklist a server when there is > only > one the agent can talk to. Could you raise an issue for this on JIRA? > Sure, I'll file an issue. Until the bug is fixed, is there some way I can prevent issues from occurring in the future if the ACE server becomes unavailable again? Would setting "agent.discovery.checking=false" prevent the blacklisting? (The idea of blacklisting is to create a crude form of failover: suppose > you’ve > multiple ACE servers up and running, a client could try each one of them in > case on of them is not accessible.) > > > The troubling part is that we have a DS component running in our client > > application that pings the server periodically, but the clients all > stopped > > pinging after the server outage. In every log I checked, the pings > stopped > > immediately after the ACE agent blacklisted the server IP. The ping is > just > > a task running under the standard Java ScheduledExecutorService that > POSTs > > to our server every few minutes using the Apache HttpClient. Is it > possible > > that the ACE agent could interfere with that somehow? The service running > > the ping task doesn't log that it got stopped or failed in any way. Other > > services on the client are working normally. > > How does your job obtain the server IP? Through the DiscoveryHandler of the > agent itself? If so, than this might be the culprit as it no longer returns > the IP of the server since it is blacklisted, and there are no alternative > server IPs to return... > It's completely independent of the agent service, and I can't think of any reason why this would happen without knowing more about the internals of the agent. > > HtH, > > -- > Met vriendelijke groeten | Kind regards > > Jan Willem Janssen | Software Architect > +31 631 765 814 > > My world is revolving around INAETICS and Amdatu > > Luminis Technologies B.V. > Churchillplein 1 > 7314 BZ Apeldoorn > +31 88 586 46 00 > > http://www.luminis-technologies.com > http://www.luminis.eu > > KvK (CoC) 09 16 28 93 > BTW (VAT) NL8169.78.566.B.01 > >
