Hi, some comments inline, but be aware is mostly a mental exercise from my part, a brainstorm if you will :)
On 17 November 2016 at 15:59, Arndt, Jonas <[email protected]> wrote: > Hi Guilherme, > > I don't think there currently is an effort inside the OpenSAF Project to > accomplish this. There are many different use cases here though and I > wanted to understand exactly which of these you think would be the first > thing to go after. > > (1) OpenStack Services > In this scenario OpenSAF could be used to give service availability to the > actual OpenStack's centralized services (e.g. NOVA, Cinder, Neutron, > Keystone...). > Here we have the basic scenario, Openstack is not capable of HA alone, and right now all upgrades are quite painful and prone to errors, maybe upgrade campaigns could prove useful for people working on getting stable and reliable upgrades? Maybe even make some of the components SA_AWARE so they could be pre-instantiable? Apart from the obvious fit, most of the conversation in Openstack communities is shifting with the trend to containerize Openstack components and rely on kubernetes and the likes for this job, not sure how HA this can be tho, not enough information, maybe is worth exploring. > > (2) OpenStack computes > Here we could monitor VMs health and restart as needed. The trick here is > of course to make sure that NOVA is in sync with this. For quick restarts > on the same node it might be less of a problem but in scenarios when a > whole node goes down NOVA needs to be involved. Should we here sit between > NOVA and libvirt or should we change NOVA itself? Would OpenStack embrace > such change? > Then there is Neutron and detecting issues with networking. We could > obviously plug into OVS and detect issues here. This would be a "No > Redundancy" type of approach as OVS wouldn't fail over. How to isolate and > recover? What about DPDK enabled OVS? How to tie into Neutron... > I would think that is a good scenario to explore as well, but I lack the insight in how openstack implemented that. But to exemplify I will talk about a scenario I saw, and that initially triggered my idea of asking here in this list, - deployment is done through HEAT templates - you have no way to guarantee that the resources created will be there, meaning, if a instance described in the template crashes, you have no way to know. They started a revamp in HEAT so he could a smarter engine, they created what they called convergence engine, a lot of the work is already in place, and they have a better way to control resources while creating them, but after is created there's no monitoring, they have the blueprint to do what they are calling continuous-observer, that would restart machines (and associate resources) in case of them disappearing, but this work was postponed (was supposed to be delivered in the Newton release) and from the info I got in their community seems it's not coming in the next release either, MAYBE the next after that. > > (3) SA_AWARE VNFs > We could also offer up OpenSAF APIs to the VNFs (or applications inside > VMs) so that the VNF vendor could use these APIs to develop SA_AWARE > solutions. Kind of like HA as a service. The big question here is whether > or not anybody would use these APIs. > This could prove useful, Telcos will probably be willing to try it at least > > I am sure there are many more use cases here but this is what I could > thinks about initially. Feel free to comment on what you feel is the low > hanging fruit or what to go after first. Also, the OpenSAF project > historically have not stepped outside the AIS spec much. That is, any > solutions on top of OpenSAF is not really addressed by the project. That's > why you don't see a flashy GUI that shows you the status of the cluster and > so on. > > As you said, there are probably several areas of overlapping and integration points. I understand that the idea is to be a "framework" and not a do-it-all thing, not trying to push any funny idea here :) In a general rant now, my main point here is that, besides all the buzz, Openstack and cloud in general does not give you HA like they keep saying, not five nines at least, and I keep struggling to understand why they keep feeding this NIH syndrome instead of trying to rely or at least leverage prior work/studies done in the area. The example I gave for example, I know already some people trying to circumvent the problem using a mix of the available components they have (e.g. https://www.openstack.org/videos/video/building-self-healing-applications-with-aodh-zaqar-and-mistral), so basically would be very similar to the AIS approach, a monitoring system with alarming (aodh), a messaging system (zaqar) and a task scheduler (mistral). So basically I see that at several levels the AIS specs are still very relevant, and maybe some demos could show that there's a lot of value in trying some sort of integration. But let's get this conversation going, I just started thinking in possible scenarios, and initially my idea was to implement a very simple demo showing OpenSAF doing this piece of monitoring/restarting they are lacking and trying to get done. Cheers, Guilherme Cheers, > > // Jonas > > > ----Original Message----- > From: Guilherme Moro [mailto:[email protected]] > Sent: Tuesday, November 15, 2016 10:52 PM > To: [email protected] > Subject: [devel] Openstack > > Hi, > > Is there anyone working/planning/thinking anything related to Openstack and > OpenSAF integration? > > I know that in the very basic there would be two scenarios, using OpenSAF > to run Openstack itself (currently they use Pacemaker/corosync to achieve > HA) and running OpenSAF within the VM's to achieve application HA. > > Is there anyone thinking beyond this point? > > Regards, > > Guilherme Moro > > > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
