Le 11/06/2015 18:52, Vilobh Meshram a écrit :
Few more places which can trigger inconsistent behaviour.

- https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/services.py#L44

- https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hypervisors.py#L98

- https://github.com/openstack/nova/blob/stable/kilo/nova/availability_zones.py#L130

- https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/availability_zone.py#L68

- https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hosts.py#L88-L89

- https://github.com/openstack/nova/blob/stable/kilo/nova/compute/api.py#L3399-L3421.


Blueprint which plans to fix this : https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane

Related Spec : 1) https://review.openstack.org/#/c/190322/

     2) https://review.openstack.org/#/c/138607/

-Vilobh



tl,dr: checking a Service (is_up) should only be for making sure we can send a message to it, but not for checking if the related hypervisor(s) is/are up. Having a reference in the services table mapping 1:1 to a reference in a separate datastore is fine by me.


So, I'm going to review the specs above and leave my comments there.
That said, I want to also point out some humble opinion about what should be the relationship between a Service and what could be called the "ServiceGroup API" (badly named IMHO since it only checks a service, not a group ;-) )

From my perspective, the Service object is related to the AMQP service tied to the queue and... that's it. That has nothing to do related to an hypervisor (since hypervisors can be distributed for a single service). That only represents the single point of failure for messages sent to a nova-compute service (and not a compute node, remember the distributed stuff) and since this is the only way to communicate with the related hypervisor(s), we have to know its status.

Again, that doesn't necessarly imply that if the service (who listens to the AMQP queue) is up, the hypervisors will be up as well, but that's enough strong to say that if it's down, we are sure that the hypervisor(s) won't receive messages. Whether if the hypervisor is still continuing to work while the service is down is a corner case that the service status should not provide IMHO.

That's exactly why we need to consider that the service is a reference which can be used as it is for any relationship with a list of hypervisors (call that ComputeNode now) and checking its state (using any driver for it) should just be used for knowing if the message can be sent to it - *and not for checking if the related hypervisor(s) are running or not*

Given that disclaimer (which implies that we need to be very clear about when to wonder if is_up(service) ), I'm fine with considering the reference stored in DB (ie. the services table) as only a list of references pointing to a separate object which can be stored in any datastore (DB/Memcache/ZK/pick your favorite)

The only thing we need to make sure is that there is a 1:1 mapping between the 2 objects (eg. the DB "service" item and the "datastored" object) which can only be done logically.

My 2 cts,
-Sylvain



On Mon, May 11, 2015 at 8:08 AM, Chris Friesen <[email protected] <mailto:[email protected]>> wrote:

    On 05/11/2015 07:13 AM, Attila Fazekas wrote:

            From: "John Garbutt" <[email protected]
            <mailto:[email protected]>>


            * From the RPC api point of view, do we want to send a cast to
            something that we know is dead, maybe we want to? Should
            we wait for
            calls to timeout, or give up quicker?


        How to fail sooner:
        https://bugs.launchpad.net/oslo.messaging/+bug/1437955

        We do not need a dedicated is_up just for this.


    Is that really going to help?  As I understand it if nova-compute
    dies (or is isolated) then the queue remains present on the server
    but nothing will process messages from it.

    Chris


    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    [email protected]?subject:unsubscribe
    <http://[email protected]?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to