Ken Gaillot <[email protected]> wrote: > On 04/21/2016 06:09 PM, Adam Spiers wrote: > > Ken Gaillot <[email protected]> wrote: > >> Hello everybody, > >> > >> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! > >> > >> The most prominent feature will be Klaus Wenninger's new implementation > >> of event-driven alerts -- the ability to call scripts whenever > >> interesting events occur (nodes joining/leaving, resources > >> starting/stopping, etc.). > > > > Ooh, that sounds cool! Can it call scripts after fencing has > > completed? And how is it determined which node the script runs on, > > and can that be limited via constraints or similar? > > Yes, it called after all "interesting" events (including fencing), and > the script can use the provided environment variables to determine what > type of event it was.
Great. Does the script run on the DC, or is that configurable somehow? > We don't notify before events, because at that moment we don't know > whether the event will really happen or not. We might try but fail. You lost me here ;-) > > I'm wondering if it could replace the current fencing_topology hack we > > use to invoke fence_compute which starts the workflow for recovering > > VMs off dead OpenStack nova-compute nodes. > > Yes, that is one of the reasons we did this! Haha, at this point can I say great minds think alike? ;-) > The initial implementation only allowed for one script to be called (the > "notification-agent" property), but we quickly found out that someone > might need to email an administrator, notify nova-compute, and do other > types of handling as well. Making someone write one script that did > everything would be too complicated and error-prone (and unsupportable). > So we abandoned "notification-agent" and went with this new approach. > > Coordinate with Andrew Beekhof for the nova-compute alert script, as he > already has some ideas for that. OK. I'm sure we'll be able to talk about this more next week in Austin! > > Although even if that's possible, maybe there are good reasons to stay > > with the fencing_topology approach? > > > > Within the same OpenStack compute node HA scenario, it strikes me that > > this could be used to invoke "nova service-disable" when the > > nova-compute service crashes on a compute node and then fails to > > restart. This would eliminate the window in between the crash and the > > nova server timing out the nova-compute service - during which it > > would otherwise be possible for nova-scheduler to attempt to schedule > > new VMs on the compute node with the crashed nova-compute service. > > > > IIUC, this is one area where masakari is currently more sophisticated > > than the approach based on OCF RAs: > > > > https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md#evacuation-patterns > > > > Does that make sense? > > Maybe. The script would need to be able to determine based on the > provided environment variables whether it's in that situation or not. Yep. _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
