[openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 11:02 AM, Clint Byrum wrote: So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. Is it not sufficient that nova service-list shows the compute service as up? If not, then maybe we should call that a bug in nova... Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/13 17:19, Chris Friesen wrote: On 12/12/2013 11:02 AM, Clint Byrum wrote: So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. Is it not sufficient that nova service-list shows the compute service as up? If not, then maybe we should call that a bug in nova... The nova-compute service does not, currently, know about the health of, say, the neutron openvswitch agent running on the same hardware, although that being in good shape is necessary to be able to start instances and have them be useful. This kind of cross-project state coordination doesn't exist right now, AFAIK. Cheers, -- Stephen Gran Senior Systems Integrator - theguardian.com Please consider the environment before printing this email. -- Visit theguardian.com On your mobile, download the Guardian iPhone app theguardian.com/iphone and our iPad edition theguardian.com/iPad Save up to 33% by subscribing to the Guardian and Observer - choose the papers you want and get full digital access. Visit subscribe.theguardian.com This e-mail and all attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender and delete the e-mail and all attachments immediately. Do not disclose the contents to another person. You may not use the information for any purpose, or store, or copy, it in any way. Guardian News Media Limited is not liable for any computer viruses or other material transmitted with or as part of this e-mail. You should employ virus checking software. Guardian News Media Limited A member of Guardian Media Group plc Registered Office PO Box 68164 Kings Place 90 York Way London N1P 2AP Registered in England Number 908396 -- ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
Excerpts from Chris Friesen's message of 2013-12-12 09:19:42 -0800: On 12/12/2013 11:02 AM, Clint Byrum wrote: So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. Is it not sufficient that nova service-list shows the compute service as up? I could spin waiting for at least one. Not a bad idea actually. However, I suspect that will only handle the situations I've gotten where the scheduler returns NoValidHost. I say that because I think if it shows there, it matches the all hosts filter and will have things scheduled on it. With one compute host I get failures after scheduling because neutron has no network segment to bind to. That is because the L2 agent on the host has not yet registered itself with Neutron. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ This makes sense to me as well. Although, not all Neutron plugins have an L2 agent, so I think the check needs to be more generic than that. For example, the OpenDaylight MechanismDriver we have developed doesn't need an agent. I also believe the Nicira plugin is agent-less, perhaps there are others as well. And I should note, does this sort of integration also happen with cinder, for example, when we're dealing with storage? Any other services which have a requirement on startup around integration with nova as well? Thanks, Kyle -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 12:53 PM, Kyle Mestery wrote: On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ This makes sense to me as well. Although, not all Neutron plugins have an L2 agent, so I think the check needs to be more generic than that. For example, the OpenDaylight MechanismDriver we have developed doesn't need an agent. I also believe the Nicira plugin is agent-less, perhaps there are others as well. And I should note, does this sort of integration also happen with cinder, for example, when we're dealing with storage? Any other services which have a requirement on startup around integration with nova as well? Right, it's more general than is the L2 agent alive and running. It's more about having each service understand the relative dependencies it has on other supporting services. For instance, have each service implement a: GET /healthcheck that would return either a 200 OK or 409 Conflict with the body containing a list of service types that it is waiting to hear back from in order to provide a 200 OK for itself. Anyway, just some thoughts... -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
Excerpts from Kyle Mestery's message of 2013-12-12 09:53:57 -0800: On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ This makes sense to me as well. Although, not all Neutron plugins have an L2 agent, so I think the check needs to be more generic than that. For example, the OpenDaylight MechanismDriver we have developed doesn't need an agent. I also believe the Nicira plugin is agent-less, perhaps there are others as well. And I should note, does this sort of integration also happen with cinder, for example, when we're dealing with storage? Any other services which have a requirement on startup around integration with nova as well? Does cinder actually have per-compute-host concerns? I admit to being a bit cinder-stupid here. Anyway, it seems to me that any service that is compute-host aware should be able to respond to the compute host whether or not it is a) aware of it, and b) ready to serve on it. For agent-less drivers that is easy, you just always return True. And for drivers with agents, you return false unless you can find an agent for the host. So something like: GET /host/%(compute-host-name) And then in the response include a ready attribute that would signal whether all networks that should work there, can work there. As a first pass, just polling until that is ready before nova-compute enables itself would solve the problems I see (and that I think users would see as a cloud provider scales out compute nodes). Longer term we would also want to aim at having notifications available for this so that nova-compute could subscribe to that notification bus and then disable itself if its agent ever goes away. I opened this bug to track the issue. I suspect there are duplicates of it already reported, but would like to start clean to make sure it is analyzed fully and then we can use those other bugs as test cases and confirmation: https://bugs.launchpad.net/nova/+bug/1260440 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 12:35 PM, Clint Byrum wrote: Excerpts from Chris Friesen's message of 2013-12-12 09:19:42 -0800: On 12/12/2013 11:02 AM, Clint Byrum wrote: So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. Is it not sufficient that nova service-list shows the compute service as up? I could spin waiting for at least one. Not a bad idea actually. However, I suspect that will only handle the situations I've gotten where the scheduler returns NoValidHost. Right it solves this case I say that because I think if it shows there, it matches the all hosts filter and will have things scheduled on it. With one compute host I get failures after scheduling because neutron has no network segment to bind to. That is because the L2 agent on the host has not yet registered itself with Neutron. but not this one. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
On 12/12/2013 01:36 PM, Clint Byrum wrote: Excerpts from Kyle Mestery's message of 2013-12-12 09:53:57 -0800: On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ This makes sense to me as well. Although, not all Neutron plugins have an L2 agent, so I think the check needs to be more generic than that. For example, the OpenDaylight MechanismDriver we have developed doesn't need an agent. I also believe the Nicira plugin is agent-less, perhaps there are others as well. And I should note, does this sort of integration also happen with cinder, for example, when we're dealing with storage? Any other services which have a requirement on startup around integration with nova as well? Does cinder actually have per-compute-host concerns? I admit to being a bit cinder-stupid here. No, it doesn't. Anyway, it seems to me that any service that is compute-host aware should be able to respond to the compute host whether or not it is a) aware of it, and b) ready to serve on it. For agent-less drivers that is easy, you just always return True. And for drivers with agents, you return false unless you can find an agent for the host. So something like: GET /host/%(compute-host-name) And then in the response include a ready attribute that would signal whether all networks that should work there, can work there. As a first pass, just polling until that is ready before nova-compute enables itself would solve the problems I see (and that I think users would see as a cloud provider scales out compute nodes). Longer term we would also want to aim at having notifications available for this so that nova-compute could subscribe to that notification bus and then disable itself if its agent ever goes away. I opened this bug to track the issue. I suspect there are duplicates of it already reported, but would like to start clean to make sure it is analyzed fully and then we can use those other bugs as test cases and confirmation: https://bugs.launchpad.net/nova/+bug/1260440 Sounds good. I'm happy to do this in Nova, but we'll have to get the Neutron API bit sorted out first. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?
Maybe time to revive something like: https://review.openstack.org/#/c/12759/ From experience, all sites (and those internal to yahoo) provide a /status (or equivalent) that is used for all sorts of things (from basic load-balancing up/down) to other things like actually introspecting the state of the process (or to get basics about what the process is doing). Typically this is not exposed to the public (its why http://www.yahoo.com/status works for me but not for u). It seems like something like that could help (but of course not completely solve) the type of response jay mentioned. -Josh On 12/12/13 10:10 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:53 PM, Kyle Mestery wrote: On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote: On 12/12/2013 12:36 PM, Clint Byrum wrote: Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800: On 12/12/2013 12:02 PM, Clint Byrum wrote: I've been chasing quite a few bugs in the TripleO automated bring-up lately that have to do with failures because either there are no valid hosts ready to have servers scheduled, or there are hosts listed and enabled, but they can't bind to the network because for whatever reason the L2 agent has not checked in with Neutron yet. This is only a problem in the first few minutes of a nova-compute host's life. But it is critical for scaling up rapidly, so it is important for me to understand how this is supposed to work. So I'm asking, is there a standard way to determine whether or not a nova-compute is definitely ready to have things scheduled on it? This can be via an API, or even by observing something on the nova-compute host itself. I just need a definitive signal that the compute host is ready. If a nova compute host has registered itself to start having instances scheduled to it, it *should* be ready. AFAIK, we're not doing any network sanity checks on startup, though. We already do some sanity checks on startup. For example, nova-compute requires that it can talk to nova-conductor. nova-compute will block on startup until nova-conductor is responding if they happened to be brought up at the same time. We could do something like this with a networking sanity check if someone could define what that check should look like. Could we ask Neutron if our compute host has an L2 agent yet? That seems like a valid sanity check. ++ This makes sense to me as well. Although, not all Neutron plugins have an L2 agent, so I think the check needs to be more generic than that. For example, the OpenDaylight MechanismDriver we have developed doesn't need an agent. I also believe the Nicira plugin is agent-less, perhaps there are others as well. And I should note, does this sort of integration also happen with cinder, for example, when we're dealing with storage? Any other services which have a requirement on startup around integration with nova as well? Right, it's more general than is the L2 agent alive and running. It's more about having each service understand the relative dependencies it has on other supporting services. For instance, have each service implement a: GET /healthcheck that would return either a 200 OK or 409 Conflict with the body containing a list of service types that it is waiting to hear back from in order to provide a 200 OK for itself. Anyway, just some thoughts... -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev