Hi, Inline, hope it explains some.
As short the alternatives: 1./a. Have reset server state for all VMs on host with single API call. Either with 1./a/1. host specific reset server state or 1./a/2. host force down API. 1./b No reset server state is done. host force down API would trigger new kind of notification for each tenant about effected VMs. 2. No notification trough controller. Inspector would form notifications to notifier. Easier to tailor notifications and alarms as we want. Only host force down API called be Inspector, no reset server state. Br, Tomi From: Yujun Zhang [mailto:zhangyujun+...@gmail.com] Sent: Thursday, September 29, 2016 9:37 AM To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com>; opnfv-tech-discuss@lists.opnfv.org Subject: Re: [opnfv-tech-discuss] [Doctor] Reset Server State and alarms in general Hi, Tomi Thanks for the summary. I am a bit confused about the difference between the 2. and 1./b. Would you please give an example to explain how it would work? In 2. Inspector send the notification, not controller. That means notification can be tailored exactly to meet the needs. This is not the case with 1./b. 2. would assume we can tailor the notification and alarm(s) the way we want it to be. Suppose we have - tenant-a - vm-a on host-a - tenant-b - vm-b on host-a When a raw failure occurs on host-a, the existing sequence[1] would be 1. Monitor send "host-a failure" event to Inspector 2. Inspector find affected VMs (get all vm in host-a) Inspector should know already VMs on host, it shouldn’t get them here anymore if properly implemented. Anyhow you are right, this is what we have currently. 3. Inspector resets affected VMs (vm-a and vm-b) to error state Currently Inspector reset servers to error state to get notification form controller (Nova). “2.” And “1./b.” States Inspector should not reset servers to error state. Only force down host. 4. Controller request Notifier to notify all In 2. Controller is not the one making notification that triggers alarm. ... I think this is how "1./a." works. yes For "1./b." it seems to be close to the alternative sequence in fault management scenario[2]. Instead of waiting for Controller to send notification, the Inspector will directly inform the Notifier about it. 1./b. send notification from controller for all the VMs when force down host API is called on Nova. In 2. Notification to is send directly from Inspector to notifier. Apparently, 5a is mandatory before 5b and 5c. But 5b and 5c. (alt) can be triggered simultaneously with async calls. If we deploy vitrage as the inspector, VMs state error could be deduced and notified independently from "5b. Update State" action. Then the time required for updating all VMs state would not matter any more. “Get valid server state” work fulfilled the VM to have host_status so one get proper state when host down. This was done exactly as when host is down, there was no indication when querying servers that it has a problem (trough Nova servers API). This was as reset server state was not to be called. Also no existing vm state field was to be changed to indicate host is down. Anyhow if we in Doctor still insist that reset server state should be called, it is great it can be done independently as you say. "1./b." looks good to me but I'd like to hear more on "2." 1. Monitor send "host-a failure" event to Inspector 2. Inspector knows topology already, so it internally figures out the VMs on host by different tenants 3. Inspector force down host (and does fencing of host that we have nothing done currently) 4. Inspector sends needed notifications to notifier to form exactly the alarms needed by tenants (and probably also better alarm for physical fault than could be made by notification about nova-compute service state change trough service.update notification) [1] http://artifacts.opnfv.org/doctor/docs/index.html#figure-p1 Yes the figure shows trough controller, but it has been discussed already earlier wither this is a good idea. [2] http://artifacts.opnfv.org/doctor/docs/index.html#figure8 On Wed, Sep 28, 2016 at 1:27 PM Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvo...@nokia.com<mailto:tomi.juvo...@nokia.com>> wrote: Hi, As discussed yesterday in the Doctor meeting, there is several ways to approach the problem and many different aspects. If trying to make blueprint to OpenStack Nova, there is a window now open to do it couple of weeks to make it in next Ocata release (or Danube in OPNFV). Not sure if time to make that, but here is a summary: 1. The way we use “reset server state” is not the way it is used in the OpenStack. Force down host doesn’t need resetting servers state. Do we want to state that we still want to use it anyway because we want the notification to have alarm? a. Yes: 1. Do we want to enhance the functionality to reset servers state for all servers on a host? 2. Do we want force down API to be able to optionally reset server state for all VMs on host? Note! “Get valid server state” was done because the reason that there is no server specific state changing when there is a host specific fault (as reset server state is not called). This is why a host_status field was added for user querying his server to know there is nothing wrong with his VM, but it is currently down as host is in that state. b. No: We could try to have a change when calling force down host, it would send a notification about effected VMs (as many notifications as there is tenants with VMs). 2. Only inspector knows everything that is needed for different alarms and it is just overhead to push that information trough for example Nova to get notification that can translate to alarm. Also we do not get the right content to alarms anyhow. This leads to a fact that only way to have things right is to send notification from inspector to notifier to have right kind of alarms: Tenant specific alarms with their VMs and separate physical fault alarm (with respect to ETSI GS NFV-IFA 005) IMHO the only right choice is “2.” Next one would be the “1. / b.”. The least feasible thing would be to do the “1. / a.”. Br, Tomi
_______________________________________________ opnfv-tech-discuss mailing list opnfv-tech-discuss@lists.opnfv.org https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss