Le 11/06/2015 18:52, Vilobh Meshram a écrit :
Few more places which can trigger inconsistent behaviour.
-
https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/services.py#L44
-
https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hypervisors.py#L98
-
https://github.com/openstack/nova/blob/stable/kilo/nova/availability_zones.py#L130
-
https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/availability_zone.py#L68
-
https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hosts.py#L88-L89
-
https://github.com/openstack/nova/blob/stable/kilo/nova/compute/api.py#L3399-L3421.
Blueprint which plans to fix this :
https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane
Related Spec : 1) https://review.openstack.org/#/c/190322/
2) https://review.openstack.org/#/c/138607/
-Vilobh
tl,dr: checking a Service (is_up) should only be for making sure we can
send a message to it, but not for checking if the related hypervisor(s)
is/are up. Having a reference in the services table mapping 1:1 to a
reference in a separate datastore is fine by me.
So, I'm going to review the specs above and leave my comments there.
That said, I want to also point out some humble opinion about what
should be the relationship between a Service and what could be called
the "ServiceGroup API" (badly named IMHO since it only checks a service,
not a group ;-) )
From my perspective, the Service object is related to the AMQP service
tied to the queue and... that's it.
That has nothing to do related to an hypervisor (since hypervisors can
be distributed for a single service). That only represents the single
point of failure for messages sent to a nova-compute service (and not a
compute node, remember the distributed stuff) and since this is the only
way to communicate with the related hypervisor(s), we have to know its
status.
Again, that doesn't necessarly imply that if the service (who listens to
the AMQP queue) is up, the hypervisors will be up as well, but that's
enough strong to say that if it's down, we are sure that the
hypervisor(s) won't receive messages.
Whether if the hypervisor is still continuing to work while the service
is down is a corner case that the service status should not provide IMHO.
That's exactly why we need to consider that the service is a reference
which can be used as it is for any relationship with a list of
hypervisors (call that ComputeNode now) and checking its state (using
any driver for it) should just be used for knowing if the message can be
sent to it - *and not for checking if the related hypervisor(s) are
running or not*
Given that disclaimer (which implies that we need to be very clear about
when to wonder if is_up(service) ), I'm fine with considering the
reference stored in DB (ie. the services table) as only a list of
references pointing to a separate object which can be stored in any
datastore (DB/Memcache/ZK/pick your favorite)
The only thing we need to make sure is that there is a 1:1 mapping
between the 2 objects (eg. the DB "service" item and the "datastored"
object) which can only be done logically.
My 2 cts,
-Sylvain
On Mon, May 11, 2015 at 8:08 AM, Chris Friesen
<[email protected] <mailto:[email protected]>> wrote:
On 05/11/2015 07:13 AM, Attila Fazekas wrote:
From: "John Garbutt" <[email protected]
<mailto:[email protected]>>
* From the RPC api point of view, do we want to send a cast to
something that we know is dead, maybe we want to? Should
we wait for
calls to timeout, or give up quicker?
How to fail sooner:
https://bugs.launchpad.net/oslo.messaging/+bug/1437955
We do not need a dedicated is_up just for this.
Is that really going to help? As I understand it if nova-compute
dies (or is isolated) then the queue remains present on the server
but nothing will process messages from it.
Chris
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
[email protected]?subject:unsubscribe
<http://[email protected]?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev