BIll, Thanks for your quick reply. That makes a lot of sense. I had very nearly added the disclaimer, "this seems like a fundamental requirement _from the perspective of my current use case_", and your reply neatly summarizes what some other use cases are.
Alternatively I might have asked "why can't announcement be configured to wait for a health check"--but I appreciate there has to be a limit to what is configurable. And, yeah, in our case the upstream service will eventually handle health checks -- we just haven't implemented that quite yet. Regards, --Richard On Tue, Mar 21, 2017 at 11:53 AM, Bill Farner <[email protected]> wrote: > Announcement is done immediately to announce presence of an instance for > other services to determine what to do from there. A use case we considered > was allowing monitoring of a service via HTTP before the service is ready > for traffic. This is useful, for example, if the application has a long > burn-in setup phase. > > In your case, the expectation is that the load balancer (or other upstream > service) handles and routes away from unavailable backends; whether it's > because they are not yet ready or otherwise. This could be using independent > health checks or retries, depending on what is available. > > > On Mar 21, 2017, 8:28 AM -0700, Richard Klancer <[email protected]>, wrote: > > Hi all, > > I'm preparing to launch a public-facing Aurora based HTTP service. As > part of this exercise my team recently attempted to `aurora update` > the service while it was serving high request volume from an external > load generator. > > We were surprised to find that our ops team was paged due to bursts of > 502's from our frontend server, which routes external traffic to our > service using the serverset published by the Aurora announcer. Upon > investigation, we discovered that the serverset is announced as soon > as the thermos executor runs, even though the app is not ready to > serve requests right away. The 502s, of course, were due to the chosen > server not yet being able to respond to a connection request. > > Last night I searched JIRA, the user and dev mailing lists, and the > thermos code, and I didn't see any conversations about delaying > announcement until the configured health check passes (thus indicating > that the server is ready to accept connections) > > I'm curious why not? This seems like a fundamental requirement. > > A couple notes. First, our frontend server doesn't support explicit > health checking, yet, though this will be implemented soon. Perhaps it > is considered the proper task of load balancers and frontend servers > to validate the health of servers in the serverset before routing > traffic to them? > > Also, to work around this problem, we announced the serverset from the > app itself. This means we no longer have an 'announce' section in our > config, and thus no portmap. But http health checking is silently (in > 0.12, though not 0.17) disabled if there is no thermos port named > 'health'. We had our "admin" and "health" ports aliased, but with no > portmap I had to just rename "admin" to "health" everywhere in our job > definition. It works but it's a little silly. This was previously > noted in https://issues.apache.org/jira/browse/AURORA-321 > > Thanks in advance for any comments, > > --Richard
