On 12/12/2013 12:53 PM, Kyle Mestery wrote:
On Dec 12, 2013, at 11:44 AM, Jay Pipes <jaypi...@gmail.com> wrote:
On 12/12/2013 12:36 PM, Clint Byrum wrote:
Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800:
On 12/12/2013 12:02 PM, Clint Byrum wrote:
I've been chasing quite a few bugs in the TripleO automated bring-up
lately that have to do with failures because either there are no valid
hosts ready to have servers scheduled, or there are hosts listed and
enabled, but they can't bind to the network because for whatever reason
the L2 agent has not checked in with Neutron yet.

This is only a problem in the first few minutes of a nova-compute host's
life. But it is critical for scaling up rapidly, so it is important for
me to understand how this is supposed to work.

So I'm asking, is there a standard way to determine whether or not a
nova-compute is definitely ready to have things scheduled on it? This
can be via an API, or even by observing something on the nova-compute
host itself. I just need a definitive signal that "the compute host is
ready".

If a nova compute host has registered itself to start having instances
scheduled to it, it *should* be ready.  AFAIK, we're not doing any
network sanity checks on startup, though.

We already do some sanity checks on startup.  For example, nova-compute
requires that it can talk to nova-conductor.  nova-compute will block on
startup until nova-conductor is responding if they happened to be
brought up at the same time.

We could do something like this with a networking sanity check if
someone could define what that check should look like.

Could we ask Neutron if our compute host has an L2 agent yet? That seems
like a valid sanity check.

++

This makes sense to me as well. Although, not all Neutron plugins have
an L2 agent, so I think the check needs to be more generic than that.
For example, the OpenDaylight MechanismDriver we have developed
doesn't need an agent. I also believe the Nicira plugin is agent-less,
perhaps there are others as well.

And I should note, does this sort of integration also happen with cinder,
for example, when we're dealing with storage? Any other services which
have a requirement on startup around integration with nova as well?

Right, it's more general than "is the L2 agent alive and running". It's more about having each service understand the relative dependencies it has on other supporting services.

For instance, have each service implement a:

GET /healthcheck

that would return either a 200 OK or 409 Conflict with the body containing a list of service types that it is waiting to hear back from in order to provide a 200 OK for itself.

Anyway, just some thoughts...

-jay



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to