Re: [openstack-dev] [ClusterLabs Developers] [HA] future of OpenStack OCF resource agents (was: resource-agents v4.2.0)
Hi Tim, Tim Bell wrote: Adam, Personally, I would prefer the approach where the OpenStack resource agents are part of the repository in which they are used. Thanks for chipping in. Just checking - by this you mean the resource-agents rather than openstack-resource-agents, right? Obviously the agents aren't usable as standalone components in either context, without a cloud's worth of dependencies including Pacemaker. This is also the approach taken in other open source projects such as Kubernetes and avoids the inconsistency where, for example, Azure resource agents are in the Cluster Labs repository but OpenStack ones are not. Right. I suspect there's no clearly defined scope for the resource-agents repository at the moment, so it's probably hard to say "agent X belongs here but agent Y doesn't". Although has been alluded to elsewhere in this thread, that in itself could be problematic in terms of the repository constantly growing. This can mean that people miss there is OpenStack integration available. Yes, discoverability is important, although I think we can make more impact on that via better documentation (another area I am struggling to make time for ...) This does not reflect, in any way, the excellent efforts and results made so far. I don't think it would negate the possibility to include testing in the OpenStack gate since there are other examples where code is pulled in from other sources. There are a number of technical barriers, or at very least inconveniences, here - because the resource-agents repository is hosted on GitHub, therefore none of the normal processes based around Gerrit apply. I guess it's feasible that since Zuul v3 gained GitHub support, it could orchestrate running OpenStack CI on GitHub pull requests, although it would have to make sure to only run on PRs which affect the OpenStack RAs, and none of the others. Additionally, we'd probably need tags / releases corresponding to each OpenStack release, which means polluting a fundamentally non-OpenStack-specific repository with OpenStack-specific metadata. I think either way we go, there is ugliness. Personally I'm still leaning towards continued use of openstack-resource-agents, but I'm happy to go with the majority consensus if we can get a semi-respectable number of respondees :-) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ClusterLabs Developers] [HA] future of OpenStack OCF resource agents (was: resource-agents v4.2.0)
[cross-posting to openstack-dev] Oyvind Albrigtsen wrote: ClusterLabs is happy to announce resource-agents v4.2.0. Source code is available at: https://github.com/ClusterLabs/resource-agents/releases/tag/v4.2.0 The most significant enhancements in this release are: - new resource agents: [snipped] - openstack-cinder-volume - openstack-floating-ip - openstack-info That's an interesting development. By popular demand from the community, in Oct 2015 the canonical location for OpenStack-specific resource agents became: https://git.openstack.org/cgit/openstack/openstack-resource-agents/ as announced here: http://lists.openstack.org/pipermail/openstack-dev/2015-October/077601.html However I have to admit I have done a terrible job of maintaining it since then. Since OpenStack RAs are now beginning to creep into ClusterLabs/resource-agents, now seems a good time to revisit this and decide a coherent strategy. I'm not religious either way, although I do have a fairly strong preference for picking one strategy which both ClusterLabs and OpenStack communities can align on, so that all OpenStack RAs are in a single place. I'll kick the bikeshedding off: Pros of hosting OpenStack RAs on ClusterLabs - ClusterLabs developers get the GitHub code review and Travis CI experience they expect. - Receive all the same maintenance attention as other RAs - any changes to coding style, utility libraries, Pacemaker APIs, refactorings etc. which apply to all RAs would automatically get applied to the OpenStack RAs too. - Documentation gets built in the same way as other RAs. - Unit tests get run in the same way as other RAs (although does ocf-tester even get run by the CI currently?) - Doesn't get maintained by me ;-) Pros of hosting OpenStack RAs on OpenStack infrastructure - - OpenStack developers get the Gerrit code review and Zuul CI experience they expect. - Releases and stable/foo branches could be made to align with OpenStack releases (..., Queens, Rocky, Stein, T(rains?)...) - Automated testing could in the future spin up a full cloud and do integration tests by simulating failure scenarios, as discussed here: https://storyboard.openstack.org/#!/story/2002129 That said, that is still very much work in progress, so it remains to be seen when that could come to fruition. No doubt I've missed some pros and cons here. At this point personally I'm slightly leaning towards keeping them in the openstack-resource-agents - but that's assuming I can either hand off maintainership to someone with more time, or somehow find the time myself to do a better job. What does everyone else think? All opinions are very welcome, obviously. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [infra] Gerrit User Summit, November 2018
Hi all, The next forthcoming Gerrit User Summit 2018 will be Nov 15th-16th in Palo Alto, hosted by Cloudera. See the Gerrit User Summit page at: https://gerrit.googlesource.com/summit/2018/+/master/index.md and the event registration at: https://gus2018.eventbrite.com Hopefully some members of the OpenStack community can attend the event, not just so we can keep up to date with Gerrit but also so that our interests can be represented! Regards, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack-sigs] [self-healing][heat][vitrage][mistral] Self-Healing with Vitrage, Heat, and Mistral
Hi Rico, Firstly sorry for the slow reply! I am finally catching up on my backlog. Rico Lin wrote: Dear all Back to Vancouver Summit, Ifat brings out the idea of integrating Heat, Vitrage, and Mistral to bring better self-healing scenario. For previous works, There already works cross Heat, Mistral, and Zaqar for self-healing [1]. And there is works cross Vitrage, and Mistral [2]. Now we plan to start working on integrating two works (as much as it can/should be) and to make sure the scenario works and keep it working. The integrated scenario flow will look something like this: An existing monitor detect host/network failure and send an alarm to Vitrage -> Vitrage deduces that the instance is down (based on the topology and based on Vitrage templates [2]) -> Vitrage triggers Mistral to fix the instance -> application is recovered We created an Etherpad [3] to document all discussion/feedbacks/plans (and will add more detail through time) Also, create a story in self-healing SIG to track all task. The current plans are: - A spec for Vitrage resources in Heat [5] - Create Vitrage resources in Heat - Write Heat Template and Vitrage Template for this scenario - A tempest task for above scenario - Add periodic job for this scenario (with above task). The best place to host this job (IMO) is under self-healing SIG This is great! It's a perfect example of the kind of cross-project collaboration which I always hoped the SIG would host. And I really love the idea of Heat making it even easier to deploy Vitrage templates automatically. Originally I thought that this would be too hard and that the SIG would initially need to focus on documenting how to manually deploy self-healing configurations, but supporting automation early on is a very nice bonus :-) So I expect that implementing this can make lives a lot easier for operators (and users) who need self-healing :-) And yes, I agree that the SIG would be the best place to host this job. To create a periodic job for self-healing sig means we might also need a place to manage those self-healing tempest test. For this scenario, I think it will make sense if we use heat-tempest-plugin to store that scenario test (since it will wrap as a Heat template) or use vitrage-tempest-plugin (since most of the test scenario are actually already there). Sounds good. Not sure what will happen if we create a new tempest plugin for self-healing and no manager for it. Sorry for my ignorance - do you mean manager objects here[0], or some other kind of manager? [0] https://docs.openstack.org/tempest/latest/write_tests.html#manager-objects We still got some uncertainty to clear during working on it, but the big picture looks like all will works(if we doing all well on above tasks). Please provide your feedback or question if you have any. We do needs feedbacks and reviews on patches or any works. If you're interested in this, please join us (we need users/ops/devs!). [1] https://github.com/openstack/heat-templates/tree/master/hot/autohealing [2] https://github.com/openstack/self-healing-sig/blob/master/specs/vitrage-mistral-integration.rst [3] https://etherpad.openstack.org/p/self-healing-with-vitrage-mistral-heat [4] https://storyboard.openstack.org/#!/story/2002684 [5] https://review.openstack.org/#/c/578786 Thanks a lot for creating the story in Storyboard - this is really helpful :-) I'll try to help with reviews etc. and maybe even testing if I can find some extra time for it over the next few months. I can also try to help "market" this initiative in the community by promoting awareness and trying to get operators more involved. Thanks again! Excited about the direction this is heading in :-) Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] PTG Denver Horns
Matthew Thode wrote: On 18-08-07 23:18:26, David Medberry wrote: Requests have finally been made (today, August 7, 2018) to end the horns on the train from Denver to Denver International airport (within the city limits of Denver.) Prior approval had been given to remove the FLAGGERS that were stationed at each crossing intersection. Of particular note (at the bottom of the article): There’s no estimate for how long it could take the FRA to approve quiet zones. ref: https://www.9news.com/article/news/local/next/denver-officially-asks-fra-for-permission-to-quiet-a-line-horns/73-581499094 I'd recommend bringing your sleeping aids, ear plugs, etc, just in case not approved by next month's PTG. (The Renaissance is within Denver proper as near as I can tell so that nearby intersection should be covered by this ruling/decision if and when it comes down.) Thanks for the update, if you are up to it, keeping us informed on this would be nice, if only for the hilarity. Thanks indeed for the warning. If the approval doesn't go through, we may need to resume the design work started last year; see lines 187 onwards of https://etherpad.openstack.org/p/queens-PTG-feedback __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [sig][upgrades][ansible][charms][tripleo][kolla][airship] reboot or poweroff?
[Adding openstack-sigs list too; apologies for the extreme cross-posting, but I think in this case the discussion deserves wide visibility. Happy to be corrected if there's a better way to handle this.] Hi James, James Page wrote: Hi All tl;dr we (the original founders) have not managed to invest the time to get the Upgrades SIG booted - time to hit reboot or time to poweroff? TL;DR response: reboot, absolutely no question! My full response is below. Since Vancouver, two of the original SIG chairs have stepped down leaving me in the hot seat with minimal participation from either deployment projects or operators in the IRC meetings. In addition I've only been able to make every 3rd IRC meeting, so they have generally not being happening. I think the current timing is not good for a lot of folk so finding a better slot is probably a must-have if the SIG is going to continue - and maybe moving to a monthly or bi-weekly schedule rather than the weekly slot we have now. In addition I need some willing folk to help with leadership in the SIG. If you have an interest and would like to help please let me know! I'd also like to better engage with all deployment projects - upgrades is something that deployment tools should be looking to encapsulate as features, so it would be good to get deployment projects engaged in the SIG with nominated representatives. Based on the attendance in upgrades sessions in Vancouver and developer/operator appetite to discuss all things upgrade at said sessions I'm assuming that there is still interest in having a SIG for Upgrades but I may be wrong! Thoughts? As a SIG leader in a similar position (albeit with one other very helpful person on board), let me throw my £0.02 in ... With both upgrades and self-healing I think there is a big disparity between supply (developers with time to work on the functionality) and demand (operators who need the functionality). And perhaps also the high demand leads to a lot of developers being interested in the topic whilst not having much spare time to help out. That is probably why we both see high attendance at the summit / PTG events but relatively little activity in between. I also freely admit that the inevitable conflicts with downstream requirements mean that I have struggled to find time to be as proactive with driving momentum as I had wanted, although I'm hoping to pick this up again over the next weeks leading up to the PTG. It sounds like maybe you have encountered similar challenges. That said, I strongly believe that both of these SIGs offer a *lot* of value, and even if we aren't yet seeing the level of online activity that we would like, I think it's really important that they both continue. If for no other reasons, the offline sessions at the summits and PTGs are hugely useful for helping converge the community on common approaches, and the associated repositories / wikis serve as a great focal point too. Regarding online collaboration, yes, building momentum for IRC meetings is tough, especially with the timezone challenges. Maybe a monthly cadence is a reasonable starting point, or twice a month in alternating timezones - but maybe with both meetings within ~24 hours of each other, to reduce accidental creation of geographic silos. Another possibility would be to offer "open clinic" office hours, like the TC and other projects have done. If the TC or anyone else has established best practices in this space, it'd be great to hear them. Either way, I sincerely hope that you decide to continue with the SIG, and that other people step up to help out. These things don't develop overnight but it is a tremendously worthwhile initiative; after all, everyone needs to upgrade OpenStack. Keep the faith! ;-) Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ptg] Self-healing SIG meeting moved to Thursday morning
Thierry Carrez wrote: Hi! Quick heads-up: Following a request[1] from Adam Spiers (SIG lead), we modified the PTG schedule to move the Self-Healing SIG meeting from Friday (all day) to Thursday morning (only morning). You can see the resulting schedule at: https://www.openstack.org/ptg#tab_schedule Sorry for any inconvenience this may cause. It's me who should be apologising - Thierry only deserves thanks for accommodating my request at late notice ;-) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [self-healing] [ptg] [monasca] PTG track schedule published
Hi Witek, Thanks a lot for the offer! I've suggested to Thierry that Thursday morning probably works best, but if the room logistics don't permit that then we might have to accept your kind offer - I'll let you know. Cheers! Adam Bedyk, Witold wrote: Hi Adam, if nothing else works, we could probably offer you half-day of Monasca slot on Monday or Tuesday afternoon. I'm afraid though that our room might be too small for you. Cheers Witek -Original Message- From: Thierry Carrez Sent: Freitag, 20. Juli 2018 18:46 To: Adam Spiers Cc: openstack-dev mailing list Subject: Re: [openstack-dev] [self-healing] [ptg] PTG track schedule published Adam Spiers wrote: Apologies - I have had to change plans and leave on the Thursday evening (old friend is getting married on Saturday morning). Is there any chance of swapping the self-healing slot with one of the others? It's tricky, as you asked to avoid conflicts with API SIG, Watcher, Monasca, Masakari, and Mistral... Which day would be best for you given the current schedule (assuming we don't move anything else as it's too late for that). -- Thierry Carrez (ttx) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [self-healing] [ptg] PTG track schedule published
Thierry Carrez wrote: Thierry Carrez wrote: Hi everyone, Last month we published the tentative schedule layout for the 5 days of PTG. There was no major complaint, so that was confirmed as the PTG event schedule and published on the PTG website: https://www.openstack.org/ptg#tab_schedule The tab temporarily disappeared, while it is being restored you can access the schedule at: https://docs.google.com/spreadsheets/d/e/2PACX-1vRM2UIbpnL3PumLjRaso_9qpOfnyV9VrPqGbTXiMVNbVgjiR3SIdl8VSBefk339MhrbJO5RficKt2Rr/pubhtml?gid=1156322660=true Apologies - I have had to change plans and leave on the Thursday evening (old friend is getting married on Saturday morning). Is there any chance of swapping the self-healing slot with one of the others? Sorry for having to ask! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] campaign question: How can we make contributing to OpenStack easier?
Doug Hellmannwrote: Excerpts from Adam Spiers's message of 2018-04-25 18:15:42 +0100: [BTW I hope it's not considered off-bounds for those of us who aren't TC election candidates to reply within these campaign question threads to responses from the candidates - but if so, let me know and I'll shut up ;-) ] Everyone should feel free to participate! Jeremy Stanley wrote: Not only are responses from everyone in the community welcome (and like many, I think we should be asking questions like this often outside the context of election campaigning), but I wholeheartedly agree with your stance on this topic and also strongly encourage you to consider running for a seat on the TC in the future if you can swing it. Thanks both for your support! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] campaign question: How can we make contributing to OpenStack easier?
[BTW I hope it's not considered off-bounds for those of us who aren't TC election candidates to reply within these campaign question threads to responses from the candidates - but if so, let me know and I'll shut up ;-) ] Zhipeng Huangwrote: Culture wise, being too IRC-centric is definitely not helping, from my own experience getting new Cyborg developer joining our weekly meeting from China. Well we could always argue it is part of a open source/hacker culture and preferable to commercial solutions that have the constant risk of suddenly being shut down someday. But as OpenStack becomes more commercialized and widely adopted, we should be aware that more and more (potential) contributors will come from the groups who are used to non-strictly open source environment, such as product develop team which relies on a lot of "closed source" but easy to use softwares. The change ? Use more video conferences, and more commercial tools that preferred in certain region. Stop being allergic to non-open source softwares and bring more capable but not hacker culture inclined contributors to the community. I respectfully disagree :-) I know this is not a super welcomed stance in the open source hacker culture. But if we want OpenStack to be able to sustain more developers and not have a mid-life crisis then got fringed, we need to start changing the hacker mindset. I think that "the hacker mindset" is too ambiguous and generalized a concept to be useful in framing justification for change. From where I'm standing, the hacker mindset is a wonderful and valuable thing which should be preserved. However, if that conflicts with other goals of our community, such as reducing barrier to entry, then yes that is a valid concern. In that case we should examine in more detail the specific aspects of hacker culture which are discouraging potential new contributors, and try to fix those, rather than jumping to the assumption that we should instead switch to commercial tools. Given the community's "Four Opens" philosophy and strong belief in the power of Open Source, it would be inconsistent to abandon our preference for Open Source tools. For example, proprietary tools such as Slack are not popular because they are proprietary; they are popular because they have a very intuitive interface and convenient features which people enjoy. So when examining the specific question "What can we do to make it easier for OpenStack newbies to communicate with the existing community over a public instant messaging system?", the first question should not be "Should we switch to a proprietary tool?", but rather "Is there an open source tool which provides enough of the functionality we need?" And in fact in the case of instant messaging, I believe the answer is yes, as I previously pointed out: http://lists.openstack.org/pipermail/openstack-sigs/2018-March/000332.html Similarly, there are plenty of great Open Source solutions for voice and video communications. I'm all for changing with the times and adapting workflows to harness the benefits of more modern tools, but I think it's wrong to automatically assume that this can only be achieved via proprietary solutions. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream
Raoul Scarazziniwrote: On 15/03/2018 01:57, Ghanshyam Mann wrote: Thanks all for starting the collaboration on this which is long pending things and we all want to have some start on this. Myself and SamP talked about it during OPS meetup in Tokyo and we talked about below draft plan- - Update the Spec - https://review.openstack.org/#/c/443504/. which is almost ready as per SamP and his team is working on that. - Start the technical debate on tooling we can use/reuse like Yardstick etc, which is more this mailing thread. - Accept the new repo for Eris under QA and start at least something in Rocky cycle. I am in for having meeting on this which is really good idea. non-IRC meeting is totally fine here. Do we have meeting place and time setup ? -gmann Hi Ghanshyam, as I wrote earlier in the thread it's no problem for me to offer my bluejeans channel, let's sort out which timeslice can be good. I've added to the main etherpad [1] my timezone (line 53), let's do all that so that we can create the meeting invite. [1] https://etherpad.openstack.org/p/extreme-testing-contacts Good idea! I've added mine. We're still missing replies from several key stakeholders though (lines 62++) - probably worth getting buy-in from a few more people before we organise anything. I'm pinging a few on IRC with reminders about this. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [self-healing] Dublin PTG summary, and request for feedback
Hi all, I just posted a summary of the Self-healing SIG session at the Dublin PTG: http://lists.openstack.org/pipermail/openstack-sigs/2018-March/000317.html If you are interested in the topic of self-healing within OpenStack, you are warmly invited to subscribe to the openstack-sigs mailing list: http://lists.openstack.org/pipermail/openstack-sigs/ and/or join the #openstack-self-healing channel on Freenode IRC. We are actively gathering feedback to help steer the SIG's focus in the right direction, so all thoughts are very welcome, especially from operators, since the primary goal of the SIG is to make life easier for operators. I have also just created an etherpad for brainstorming topics for the Forum in Vancouver: https://etherpad.openstack.org/p/YVR-self-healing-brainstorming Feel free to put braindumps in there :-) Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream
Raoul Scarazzini <ra...@redhat.com> wrote: On 08/03/2018 17:03, Adam Spiers wrote: [...] Yes agreed again, this is a strong case for collaboration between the self-healing and QA SIGs. In Dublin we also discussed the idea of the self-healing and API SIGs collaborating on the related topic of health check APIs. Guys, thanks a ton for your involvement in the topic, I am +1 to any kind of meeting we can have to discuss this (like it was proposed by Adam) so I'll offer my bluejeans channel for whatever kind of meeting we want to organize. Awesome, thanks - bluejeans would be great. About the best practices part Georg was mentioning I'm 100% in agreement, the testing methodologies are the first thing we need to care about, starting from what we want to achieve. That said, I'll keep studying Yardstick. Hope to hear from you soon, and thanks again! Yep - let's wait for people to catch up with the thread and hopefully we'll get enough volunteers on https://etherpad.openstack.org/p/extreme-testing-contacts for critical mass and then we can start discussing! I think it's especially important that we have the Eris folks on board since they have already been working on this for a while. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream
Georg Kunzwrote: Hi Adam, Raoul Scarazzini wrote: In the meantime, I'll check yardstick to see which kind of bridge we can build to avoid reinventing the wheel. Great, thanks! I wish I could immediately help with this, but I haven't had the chance to learn yardstick myself yet. We should probably try to recruit someone from OPNFV to provide advice. I've cc'd Georg who IIRC was the person who originally told me about yardstick :-) He is an NFV expert and is also very interested in automated testing efforts: http://lists.openstack.org/pipermail/openstack-dev/2017-November/124942.html so he may be able to help with this architectural challenge. Thank you for bringing this up here. Better collaboration and sharing of knowledge, methodologies and tools across the communities is really what I'd like to see and facilitate. Hence, I am happy to help. I have already started to advertise the newly proposed QA SIG in the OPNFV test WG and I'll happily do the same for the self-healing SIG and any HA testing efforts in general. There is certainly some overlapping interest in these testing aspects between the QA SIG and the self-healing SIG and hence collaboration between both SIGs is crucial. That's fantastic - thank you so much! One remark regarding tools and frameworks: I consider the true value of a SIG to be a place for talking about methodologies and best practices: What do we need to test? What are the challenges? How can we approach this across communities? The tools and frameworks are important and we should investigate which tools are available, how good they are, how much they fit a given purpose, but at the end of the day they are tools meant to enable well designed testing methodologies. Agreed 100%. [snipped] I'm beginning to think that maybe we should organise a video conference call to coordinate efforts between the various interested parties. If there is appetite for that, the first question is: who wants to be involved? To answer that, I have created an etherpad where interested people can sign up: https://etherpad.openstack.org/p/extreme-testing-contacts and I've cc'd people who I think would probably be interested. Does this sound like a good approach? We discussed a very similar idea in Dublin in the context of the QA SIG. I very much like the idea of a cross-community, cross-team, and apparently even cross-SIG approach. Yes agreed again, this is a strong case for collaboration between the self-healing and QA SIGs. In Dublin we also discussed the idea of the self-healing and API SIGs collaborating on the related topic of health check APIs. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [kolla] PTG Summary
Paul Bourkewrote: Hi all, Here's my summary of the various topics we discussed during the PTG. There were one or two I had to step out for but hopefully this serves as an overall recap. Please refer to the main etherpad[0] for more details and links to the session specific pads. [snipped] self health check support = * This had some crossover with the monitoring discussion. * Kolla has some checks in the form of our 'sanity checks', but these are underutilised and not implemented for every service. Tempest or rally would be a better fit here. Actions: * Remove the sanity check code from kolla-ansible - it's not fit for purpose and our assumption is noone is using it. * Make contact with the self healing SIG, and see if we can help here. They may have recommendations for us. * Make a spec for this. [snipped] Would be great to collaborate! As the SIG is still new we don't have regular meetings set up yet, but please join #openstack-self-healing on IRC, and you can mail the openstack-sigs list with [self-healing] in the subject. Implement rolling upgrade for all core projects === * Started by defining the 'terms of engagement', i.e. what do we mean by rolling upgrade in kolla, what we currently have vs. what projects support, etc. * There are two efforts under way here, 1) supporting online upgrade for all core projects that support it, 2) supporting FFU(offline) upgrade in Kolla. * lujinluo is working on a way to do online FFU in Kolla. * Testing - we need gates to test upgrade. Actions: * Finish implementation of rolling upgrade for all projects that support it in Rocky * Improve documentation around this and upgrades in general for Kolla * Spec in Rocky for FFU and associated efforts * Begin looking at what would be required for upgrade gates in Kolla Yes, a spec or other docs nailing down exactly what is meant by rolling upgrade and FFU upgrade would be a great help. I was in the FFU session in Dublin and it felt to me like not everyone was on the same page yet regarding the precise definitions, making it difficult for all projects to move forward together in a coherent fashion. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream
Raoul Scarazzini <ra...@redhat.com> wrote: On 06/03/2018 13:27, Adam Spiers wrote: Hi Raoul and all, Sorry for joining this discussion late! [...] I do not work on TripleO, but I'm part of the wider OpenStack sub-communities which focus on HA[0] and more recently, self-healing[1]. With that hat on, I'd like to suggest that maybe it's possible to collaborate on this in a manner which is agnostic to the deployment mechanism. There is an open spec on this> https://review.openstack.org/#/c/443504/ which was mentioned in the Denver PTG session on destructive testing which you referenced[2]. [...] https://www.opnfv.org/community/projects/yardstick [...] Currently each sub-community and vendor seems to be reinventing HA testing by itself to some extent, which is easier to accomplish in the short-term, but obviously less efficient in the long-term. It would be awesome if we could break these silos down and join efforts! :-) Hi Adam, First of all thanks for your detailed answer. Then let me be honest while saying that I didn't know yardstick. Neither did I until Sydney, despite being involved with OpenStack HA for many years ;-) I think this shows that either a) there is room for improved communication between the OpenStack and OPNFV communities, or b) I need to take my head out of the sand more often ;-) I need to start from scratch here to understand what this project is. In any case, the exact meaning of this thread is to involve people and have a more comprehensive look at what's around. The point here is that, as you can see from the tripleo-ha-utils spec [1] I've created, the project is meant for TripleO specifically. On one side this is a significant limitation, but on the other one, due to the pluggable nature of the project, I think that integrations with other software like you are proposing is not impossible. Yep. I totally sympathise with the tension between the need to get something working quickly, vs. the need to collaborate with the community in the most efficient way. Feel free to add your comments to the review. The spec looks great to me; I don't really have anything to add, and I don't feel comfortable voting in a project which I know very little about. In the meantime, I'll check yardstick to see which kind of bridge we can build to avoid reinventing the wheel. Great, thanks! I wish I could immediately help with this, but I haven't had the chance to learn yardstick myself yet. We should probably try to recruit someone from OPNFV to provide advice. I've cc'd Georg who IIRC was the person who originally told me about yardstick :-) He is an NFV expert and is also very interested in automated testing efforts: http://lists.openstack.org/pipermail/openstack-dev/2017-November/124942.html so he may be able to help with this architectural challenge. Also you should be aware that work has already started on Eris, the extreme testing framework proposed in this user story: http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/openstack_extreme_testing.html and in the spec you already saw: https://review.openstack.org/#/c/443504/ You can see ongoing work here: https://github.com/LCOO/eris https://openstack-lcoo.atlassian.net/wiki/spaces/LCOO/pages/13393034/Eris+-+Extreme+Testing+Framework+for+OpenStack It looks like there is a plan to propose a new SIG for this, although personally I would be very happy to see it adopted by the self-healing SIG, since this framework is exactly what is needed for testing any self-healing mechanism. I'm hoping that Sampath and/or Gautum will chip in here, since I think they're currently the main drivers for Eris. I'm beginning to think that maybe we should organise a video conference call to coordinate efforts between the various interested parties. If there is appetite for that, the first question is: who wants to be involved? To answer that, I have created an etherpad where interested people can sign up: https://etherpad.openstack.org/p/extreme-testing-contacts and I've cc'd people who I think would probably be interested. Does this sound like a good approach? Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA] Validating HA on upstream
Hi Raoul and all, Sorry for joining this discussion late! Raoul Scarazziniwrote: TL;DR: we would like to change the way HA is tested upstream to avoid being hitten by evitable bugs that the CI process should discover. Long version: Today HA testing in upstream consist only in verifying that a three controllers setup comes up correctly and can spawn an instance. That's something, but it’s far from being enough since we continuously see "day two" bugs. We started covering this more than a year ago in internal CI and today also on rdocloud using a project named tripleo-quickstart-utils [1]. Apart from his name, the project is not limited to tripleo-quickstart, it covers three principal roles: 1 - stonith-config: a playbook that can be used to automate the creation of fencing devices in the overcloud; 2 - instance-ha: a playbook that automates the seventeen manual steps needed to configure instance HA in the overcloud, test them via rally and verify that instance HA works; 3 - validate-ha: a playbook that runs a series of disruptive actions in the overcloud and verifies it always behaves correctly by deploying a heat-template that involves all the overcloud components; Yes, a more rigorous approach to HA testing obviously has huge value, not just for TripleO deployments, but also for any type of OpenStack deployment. To make this usable upstream, we need to understand where to put this code. Here some choices: [snipped] I do not work on TripleO, but I'm part of the wider OpenStack sub-communities which focus on HA[0] and more recently, self-healing[1]. With that hat on, I'd like to suggest that maybe it's possible to collaborate on this in a manner which is agnostic to the deployment mechanism. There is an open spec on this: https://review.openstack.org/#/c/443504/ which was mentioned in the Denver PTG session on destructive testing which you referenced[2]. As mentioned in the self-healing SIG's session in Dublin[3], the OPNFV community has already put a lot of effort into testing HA scenarios, and it would be great if this work was shared across the whole OpenStack community. In particular they have a project called Yardstick: https://www.opnfv.org/community/projects/yardstick which contains a bunch of HA test cases: http://docs.opnfv.org/en/latest/submodules/yardstick/docs/testing/user/userguide/15-list-of-tcs.html#h-a Currently each sub-community and vendor seems to be reinventing HA testing by itself to some extent, which is easier to accomplish in the short-term, but obviously less efficient in the long-term. It would be awesome if we could break these silos down and join efforts! :-) Cheers, Adam [0] #openstack-ha on Freenode IRC [1] https://wiki.openstack.org/wiki/Self-healing_SIG [2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing [3] https://etherpad.openstack.org/p/self-healing-ptg-rocky __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [masakari] Any masakari folks at the PTG this week ?
My claim to being a masakari person is pretty weak, but still I'd like to say hello too :-) Please ping me (aspiers on IRC) if you guys are meeting up! Bhor, Dineshwrote: Hi Greg, We below are present: Tushar Patil(tpatil) Yukinori Sagara(sagara) Abhishek Kekane(abhishekk) Dinesh Bhor(Dinesh_Bhor) Thank you, Dinesh Bhor From: Waines, Greg Sent: 28 February 2018 19:22:26 To: OpenStack Development Mailing List (not for usage questions) Subject: [openstack-dev] [masakari] Any masakari folks at the PTG this week ? Any masakari folks at the PTG this week ? Would be interested in meeting up and chatting, let me know, Greg. __ Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [self-healing][PTG] etherpad for PTG session on self-healing
Hi all, Yushiro kindly created an etherpad for the self-healing SIG session at the Dublin PTG on Tuesday afternoon next week, and I've fleshed it out a bit: https://etherpad.openstack.org/p/self-healing-ptg-rocky Anyone with an interest in self-healing is of course very welcome to attend (or keep an eye on it remotely!) This SIG is still very young, so it's a great chance for you to shape the direction it takes :-) If you are able to attend, please add your name, and also feel free to add topics which you would like to see covered. It would be particularly helpful if operators could participate and share their experiences of what is or isn't (yet!) working with self-healing in OpenStack, so that those of us on the development side can aim to solve the right problems :-) Thanks, and see some of you in Dublin! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Etherpad for self-healing
Furukawa, Yushirowrote: Hi everyone, I am seeing Self-healing scheduled on Tuesday afternoon[1], but the etherpad for it is not listed in [2]. I made following etherpad by some chance. Thanks! You beat me to it ;-) Would it be possible to update Etherpads wiki page? Done. https://etherpad.openstack.org/p/self-healing-ptg-rocky I'm also adding some more ideas for topics to the etherpad, and then I'll (re-)announce it here and also to openstack-{operators,sigs} to promote visibility. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Remembering Shawn Pearce (fwd)
Dear Stackers, Since git and Gerrit are at the heart of our development process, I am passing on this very sad news from the git / Gerrit communities that Shawn Pearce has passed away after an aggressive lung cancer. Shawn was founder of Gerrit / JGit / libgit2 / git-gui, and the third most prolific contributor to git itself. https://gitenterprise.me/2018/01/30/shawn-pearce-a-true-leader/ https://sfconservancy.org/blog/2018/jan/30/shawn-pearce/ https://twitter.com/cdibona/status/957822400518696960 https://public-inbox.org/git/CAP8UFD0aKqT5YXJx9-MqeKCKhOVGxninRf8tv30=hkgvmhg...@mail.gmail.com/T/#mf5c158c68565c1c68c80b6543966ef2cad6d151c https://groups.google.com/forum/#!topic/repo-discuss/B4P7G1YirdM/discussion He is survived by his wife and two young sons. A memorial fund has been set up in aid of the boys' education and future: https://gitenterprise.me/2018/01/30/gerrithub-io-donations-to-shawns-family/ Thank you Shawn for enriching our lives with your great contributions to the FLOSS community. - Forwarded message from Adam Spiers <aspi...@suse.com> - Date: Fri, 2 Feb 2018 15:12:35 + From: Adam Spiers <aspi...@suse.com> To: Luca Milanesio <l...@gerritforge.com> Subject: Re: Fwd: Remembering Shawn Pearce Hi Luca, that's such sad news :-( What an incredible contribution Shawn made to the community. In addition to Gerrit, I use git-gui and gitk regularly, and also my git-deps utility is based on libgit2. I had no idea he wrote them all, and many other things. I will certainly donate and also ensure that the OpenStack community is aware of the memorial fund. Thanks a lot for letting me know! Luca Milanesio <l...@gerritforge.com> wrote: Hi Adam, you probably have received this very sad news :-( As GerritForge we are actively supporting, contributing and promoting the donations to Shawn's Memorial Fund (https://www.gofundme.com/shawn-pearce-memorial-fund) and added a donation button to GerritHub.io <http://gerrithub.io/>. Feel free to spread the sad news to the OpenStack community you are in touch with. --- Luca Milanesio GerritForge 3rd Fl. 207 Regent Street London W1B 3HH - UK http://www.gerritforge.com <http://www.gerritforge.com/> l...@gerritforge.com <mailto:l...@gerritforge.com> Tel: +44 (0)20 3292 0677 Mob: +44 (0)792 861 7383 Skype: lucamilanesio http://www.linkedin.com/in/lucamilanesio <http://www.linkedin.com/in/lucamilanesio> > Begin forwarded message: > > From: "'Dave Borowitz' via Repo and Gerrit Discussion" <repo-disc...@googlegroups.com> > Subject: Remembering Shawn Pearce > Date: 29 January 2018 at 15:15:05 GMT > To: repo-discuss <repo-disc...@googlegroups.com> > Reply-To: Dave Borowitz <dborow...@google.com> > > Dear Gerrit community, > > I am very saddened to report that Shawn Pearce, long-time Git contributor and founder of the Gerrit Code Review project, passed away over the weekend after being diagnosed with lung cancer last year. He spent his final days comfortably in his home, surrounded by family, friends, and colleagues. > > Shawn was an exceptional software engineer and it is impossible to overstate his contributions to the Git ecosystem. He had everything from the driving high-level vision to the coding skills to solve any complex problem and bring his vision to reality. If you had the pleasure of collaborating with him on code reviews, as I know many of you did, you've seen first-hand his dedication and commitment to quality. You can read more about his contributions in this recent interview <https://git.github.io/rev_news/2017/08/16/edition-30/#developer-spotlight-shawn-pearce>. > > In addition to his technical contributions, Shawn truly loved the open-source communities he was a part of, and the Gerrit community in particular. Growing the Gerrit project from nothing to a global community with hundreds of contributors used by some of the world's most prominent tech companies is something he was extremely proud of. > > Please join me in remembering Shawn Pearce and continuing his legacy. Feel free to use this thread to share your memories with the community Shawn loved. > > If you are interested, his family has set up GoFundMe page <https://www.gofundme.com/shawn-pearce-memorial-fund> to put towards his children's future. > > Best wishes, > Dave Borowitz > > > -- > -- > To unsubscribe, email repo-discuss+unsubscr...@googlegroups.com > More info at http://groups.google.com/group/repo-discuss?hl=en <http://groups.google.com/group/repo-discuss?hl=en> > > --- > You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group. > To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscr...@
[openstack-dev] ANNOUNCE: Self-healing SIG officially formed (fwd)
As per below, I'm happy to announce that the Self-healing SIG is now officially formed. For now, all discussions will happen on <openstack-s...@lists.openstack.org>, so please subscribe to that list if you are interested in this topic! Cheers, Adam - Forwarded message from Adam Spiers <aspi...@suse.com> - Date: Mon, 27 Nov 2017 14:19:25 +0000 From: Adam Spiers <aspi...@suse.com> To: OpenStack SIGs list <openstack-s...@lists.openstack.org> Subject: [Openstack-sigs] [meta] [self-healing] ANNOUNCE: Self-healing SIG officially formed Reply-To: openstack-s...@lists.openstack.org TL;DR: the self-healing SIG is officially formed! Watch the openstack-sigs mailing list for future developments. A longer version of this announcement can be found at https://blog.adamspiers.org/2017/11/24/announcing-openstacks-self-healing-sig/ A SIG is born! -- After an unofficial kick-off meeting at the last PTG in Denver, I proposed the creation of a new self-healing SIG: http://lists.openstack.org/pipermail/openstack-sigs/2017-September/54.html At the recent Summit in Sydney, we had a Forum session around 30 people attend the Sydney Forum session, which was extremely encouraging! You can read more details in the etherpad, but here is the quick summary ... Most importantly, we resolved the naming and scoping issues, concluding that to avoid biting off too much in one go, it was better to be pragmatic and start small: - Initially focus on cloud infrastructure, and not worry too much about the user-facing impact of failures yet; we can add that concern whenever it makes sense (which is particularly relevant for telcos / NFV). - Not worry too much about optimization initially; Watcher is possibly the only project focusing on this right now, and again we can expand to include optimization any time we want. Now that the naming and scoping issues are resolved, I am excited to announce that the Self-healing SIG is officially formed! Discussion went beyond mere administravia, however: - We collected a few initial use cases. - We informally decided the governance of the SIG. I asked if anyone else would like to assume leadership, but noone seemed keen, dashing my hopes of avoiding extra work ;-) But Eric Kao, PTL of Congress, generously offered to act as co-chair. - We discussed health check APIs, which were mentioned in at least 2 or 3 other Forum sessions this time round. - We agreed that we wanted an IRC channel, and that it could host bi-weekly meetings. However as usual there was no clean solution to choosing a time which would suit everyone ;-/ I'll try to figure out what to do about this! Get involved You are warmly invited to join, if this topic interests you: - Ensure you are subscribed to the openstack-sigs mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs and watch out watch out for posts tagged =[self-healing]=. - Bookmark https://wiki.openstack.org/wiki/Self-healing_SIG which is the SIG's official home. Next steps -- I will set up the IRC channel, and see if we can make progress on agreeing times for regular IRC meetings. Other than this administravia, it is of course up to the community to decide in which direction the SIG should go, but my suggestions are: - Continue to collect use cases. It makes sense to have a very lightweight process for this (at least, initially), so Eric has created a Google Doc and populated it with a suggested template and a first example: https://docs.google.com/document/d/13N36g2RlUYs8mw7hbfRXw6y2Jc-V2XGrXgfPXPpUvuU/edit?usp=sharing Feel free to add your own based on this template. - Collect links to any existing documentation or other resources which describe how existing services can be combined. This awesome talk on Advanced Fault Management with Vitrage and Mistral is a perfect example: https://www.openstack.org/videos/sydney-2017/advanced-fault-management-with-vitrage-and-mistral and here is another: https://www.openstack.org/videos/barcelona-2016/building-self-healing-applications-with-aodh-zaqar-and-mistral but we need to make it easier for operators to understand which combinations like this are possible, and easier for them to be set up. - Finish the architecture diagram drafted in Denver: https://docs.google.com/drawings/d/1kEFtVpQ4c8HipSp34EVAkcSGmwyg1MzWf_H5oGTtl-Y/edit?usp=sharing - At a higher level, we could document reference stacks which address multiple self-healing cases. - Talk more with the OPNFV community to find out what capabilities they have which could be reused within non-NFV OpenStack clouds. - Perform gaps analysis on the use cases, and liase with specific projects to drive development in directions which can address those gaps. ___ openstack-sigs mailing list openstack-
Re: [openstack-dev] [all][deployment][kolla][tripleo][osa] Service diagnostics task force
Michał Jastrzębskiwrote: Hello my dearest of communities, During deployment tools session on PTG we discussed need for deep health checking and metering of running services. It's very relevant in context of containers (especially k8s) and HA. Things like watchdog, heartbeats or exposing relative metrics (I don't want to get into design here, suffice to say it's non-trivial problem to solve). We would like to start a "task force", few volunteers from both deployment tool side (ops, ha) and project dev teams. We would like to design together a proper health check mechanism for one of projects to create best practices and design, that later could be implemented in all other services. We would to ask for volunteer project team to join us and spearhead this effort. Sorry for the late reply - I only just found this thread. But I would certainly like to be involved too :-) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Openstack-sigs] [meta] Proposal for self-healing SIG (fwd)
Hi everyone, As per below, I've just proposed the creation of a new SIG. Feedback is very welcome - ideally it would all be collected in the same thread I started over on the openstack-sigs list, but feedback in two places is more useful than nowhere, so I'll keep an eye out here too ;-) Thanks! Adam - Forwarded message from Adam Spiers <aspi...@suse.com> - Date: Sun, 17 Sep 2017 23:35:02 +0100 From: Adam Spiers <aspi...@suse.com> To: OpenStack SIGs list <openstack-s...@lists.openstack.org> Subject: [Openstack-sigs] [meta] Proposal for self-healing SIG Hi all, [TL;DR: we want to set up a "self-healing infrastructure" SIG.] One of the biggest promises of the cloud vision was the idea that all the infrastructure could be managed in a policy-driven fashion, reacting to failures and other events by automatically healing and optimising services. Most of the components required to implement such an architecture already exist, e.g. - Monasca: Monitoring - Aodh: Alarming - Congress: Policy-based governance - Mistral: Workflow - Senlin: Clustering - Vitrage: Root Cause Analysis - Watcher: Optimization - Masakari: Compute plane HA - Freezer-dr: DR and compute plane HA However, there is not yet a clear strategy within the community for how these should all tie together. So at the PTG last week in Denver, we held an initial cross-project meeting to discuss this topic.[0] It was well-attended, with representation from almost all of the relevant projects, and it felt like a very productive session to me. I shall do my best to summarise whilst trying to avoid any misrepresentation ... There was general agreement that the following actions would be worthwhile: - Document reference stacks describing what use cases can already be addressed with the existing projects. (Even better if some of these stacks have already been tested in the wild.) - Document what integrations between the projects already exist at a technical level. (We actually began this during the meeting, by placing the projects into phases of a high-level flow, and then collaboratively building a Google Drawing to show that.[1]) - Collect real-world use cases from operators, including ones which they would like to accomplish but cannot yet. - From the above, perform gaps analysis to help shape the future direction of these projects, e.g. through specs targetting those gaps. - Perform overlap analysis to help ensure that the projects are correctly scoped and integrate well without duplicating any significant effort.[2] - Set up a SIG[3] to promote further discussion across the projects and with operators. I talked to Thierry afterwards, and consequently this email is the first step on that path :-) - Allocate the SIG a mailing list prefix - "[self-healing]" or similar. - Set up a bi-weekly IRC meeting for the SIG. - Continue the discussion at the Sydney Forum, since it's an ideal opportunity to get developers and operators together and decide what the next steps should be. - Continue the discussion at the next Ops meetup in Tokyo. I got coerced^Wvolunteered to drive the next steps ;-) So far I have created an etherpad proposing the Forum session[4], and added it to the Forum wiki page[5]. I'll also add it to the SIG wiki page[6]. There were things we did not reach a concrete conclusion on: - What should the SIG be called? We felt that "self-healing" was pretty darn close to capturing the intent of the topic. However as a natural pedant, I couldn't help but notice that technically speaking, that would most undesirably exclude Watcher, because the optimization it provides isn't *quite* "healing" - the word "healing" implies that something is sick, and optimization can be applied even when the cloud is perfectly healthy. Any suggestions for a name with a marginally wider scope would be gratefully received. - Should the SIG be scoped to only focus on self-healing (and self-optimization) of OpenStack infrastructure, or should it also include self-healing of workloads? My feeling is that we should keep it scoped to the infrastructure which falls under the responsibility of the cloud operators; anything user-facing would be very different from a process perspective. - How should the SIG's governance be set up? Unfortunately it didn't occur to me to raise this question during the discussion, but I've since seen that the k8s SIG managed to make some decisions in this regard[7], and stealing their idea of a PTL-type model with a minimum of 2 chairs sounds good to me. - Which timezone the IRC meeting should be in? As usual, there were interested parties from all the usual continents, so no one time would suit everyone. I guess I can just submit a review
Re: [openstack-dev] [oslo][barbican][sahara] start RPC service before launcher wait?
Hi Ken, Thanks a lot for the analysis, and sorry for the slow reply! Comments inline... Ken Giusti <kgiu...@gmail.com> wrote: > Hi Adam, > > I think there's a couple of problems here. > > Regardless of worker count, the service.wait() is called before > service.start(). And from looking at the oslo.service code, the 'wait()' > method is call after start(), then again after stop(). This doesn't match > up with the intended use of oslo.messaging.server.wait(), which should only > be called after .stop(). Hmm, so are you saying that there might be a bug in oslo.service's usage of oslo.messaging, and that this Sahara bugfix was the wrong approach too? https://review.openstack.org/#/c/280741/1/sahara/cli/sahara_engine.py > Perhaps a bigger issue is that in the multi threaded case all threads > appear to be calling start, wait, and stop on the same instance of the > service (oslo.messaging rpc server). At least that's what I'm seeing in my > muchly reduced test code: > > https://paste.fedoraproject.org/paste/-73zskccaQvpSVwRJD11cA > > The log trace shows multiple calls to start, wait, stop via different > threads to the same TaskServer instance: > > https://paste.fedoraproject.org/paste/dyPq~lr26sQZtMzHn5w~Vg > > Is that expected? Unfortunately in the interim, your pastes seem to have vanished - any chance you could repaste them? Thanks, Adam > On Mon, Jul 31, 2017 at 9:32 PM, Adam Spiers <aspi...@suse.com> wrote: > > Ken Giusti <kgiu...@gmail.com> wrote: > >> On Mon, Jul 31, 2017 at 10:01 AM, Adam Spiers <aspi...@suse.com> wrote: > >>> I recently discovered a bug where barbican-worker would hang on > >>> shutdown if queue.asynchronous_workers was changed from 1 to 2: > >>> > >>>https://bugs.launchpad.net/barbican/+bug/1705543 > >>> > >>> resulting in a warning like this: > >>> > >>>WARNING oslo_messaging.server [-] Possible hang: stop is waiting for > >>> start to complete > >>> > >>> I found a similar bug in Sahara: > >>> > >>>https://bugs.launchpad.net/sahara/+bug/1546119 > >>> > >>> where the fix was to call start() on the RPC service before making the > >>> launcher wait() on it, so I ported the fix to Barbican, and it seems > >>> to work fine: > >>> > >>>https://review.openstack.org/#/c/485755 > >>> > >>> I noticed that both projects use ProcessLauncher; barbican uses > >>> oslo_service.service.launch() which has: > >>> > >>>if workers is None or workers == 1: > >>>launcher = ServiceLauncher(conf, restart_method=restart_method) > >>>else: > >>>launcher = ProcessLauncher(conf, restart_method=restart_method) > >>> > >>> However, I'm not an expert in oslo.service or oslo.messaging, and one > >>> of Barbican's core reviewers (thanks Kaitlin!) noted that not many > >>> other projects start the task before calling wait() on the launcher, > >>> so I thought I'd check here whether that is the correct fix, or > >>> whether there's something else odd going on. > >>> > >>> Any oslo gurus able to shed light on this? > >>> > >> > >> As far as an oslo.messaging server is concerned, the order of operations > >> is: > >> > >> server.start() > >> # do stuff until ready to stop the server... > >> server.stop() > >> server.wait() > >> > >> The final wait blocks until all requests that are in progress when stop() > >> is called finish and cleanup. > > > > Thanks - that makes sense. So the question is, why would > > barbican-worker only hang on shutdown when there are multiple workers? > > Maybe the real bug is somewhere in oslo_service.service.ProcessLauncher > > and it's not calling start() correctly? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [oslo][barbican][sahara] start RPC service before launcher wait?
Ken Giusti <kgiu...@gmail.com> wrote: On Mon, Jul 31, 2017 at 10:01 AM, Adam Spiers <aspi...@suse.com> wrote: I recently discovered a bug where barbican-worker would hang on shutdown if queue.asynchronous_workers was changed from 1 to 2: https://bugs.launchpad.net/barbican/+bug/1705543 resulting in a warning like this: WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete I found a similar bug in Sahara: https://bugs.launchpad.net/sahara/+bug/1546119 where the fix was to call start() on the RPC service before making the launcher wait() on it, so I ported the fix to Barbican, and it seems to work fine: https://review.openstack.org/#/c/485755 I noticed that both projects use ProcessLauncher; barbican uses oslo_service.service.launch() which has: if workers is None or workers == 1: launcher = ServiceLauncher(conf, restart_method=restart_method) else: launcher = ProcessLauncher(conf, restart_method=restart_method) However, I'm not an expert in oslo.service or oslo.messaging, and one of Barbican's core reviewers (thanks Kaitlin!) noted that not many other projects start the task before calling wait() on the launcher, so I thought I'd check here whether that is the correct fix, or whether there's something else odd going on. Any oslo gurus able to shed light on this? As far as an oslo.messaging server is concerned, the order of operations is: server.start() # do stuff until ready to stop the server... server.stop() server.wait() The final wait blocks until all requests that are in progress when stop() is called finish and cleanup. Thanks - that makes sense. So the question is, why would barbican-worker only hang on shutdown when there are multiple workers? Maybe the real bug is somewhere in oslo_service.service.ProcessLauncher and it's not calling start() correctly? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [oslo][barbican][sahara] start RPC service before launcher wait?
Hi all, I recently discovered a bug where barbican-worker would hang on shutdown if queue.asynchronous_workers was changed from 1 to 2: https://bugs.launchpad.net/barbican/+bug/1705543 resulting in a warning like this: WARNING oslo_messaging.server [-] Possible hang: stop is waiting for start to complete I found a similar bug in Sahara: https://bugs.launchpad.net/sahara/+bug/1546119 where the fix was to call start() on the RPC service before making the launcher wait() on it, so I ported the fix to Barbican, and it seems to work fine: https://review.openstack.org/#/c/485755 I noticed that both projects use ProcessLauncher; barbican uses oslo_service.service.launch() which has: if workers is None or workers == 1: launcher = ServiceLauncher(conf, restart_method=restart_method) else: launcher = ProcessLauncher(conf, restart_method=restart_method) However, I'm not an expert in oslo.service or oslo.messaging, and one of Barbican's core reviewers (thanks Kaitlin!) noted that not many other projects start the task before calling wait() on the launcher, so I thought I'd check here whether that is the correct fix, or whether there's something else odd going on. Any oslo gurus able to shed light on this? Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vitrage] [nova] [HA] [masakari] VM Heartbeat / Healthcheck Monitoring
I don't see any reason why masakari couldn't handle that, but you'd have to ask Sampath and the masakari team whether they would consider that in scope for their roadmap. Waines, Greg <greg.wai...@windriver.com> wrote: Sure. I can propose a new user story. And then are you thinking of including this user story in the scope of what masakari would be looking at ? Greg. From: Adam Spiers <aspi...@suse.com> Reply-To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Date: Wednesday, May 17, 2017 at 10:08 AM To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring Thanks for the clarification Greg. This sounds like it has the potential to be a very useful capability. May I suggest that you propose a new user story for it, along similar lines to this existing one? http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html Waines, Greg <greg.wai...@windriver.com<mailto:greg.wai...@windriver.com>> wrote: Yes that’s correct. VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances. I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not, but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture. And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking. You’re correct, that VM HB/HC Monitoring would leverage https://wiki.libvirt.org/page/Qemu_guest_agent that would require the agent to be installed in the images for talking back to the compute host. ( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management ) Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device. i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up. Wrt failure modes / use-cases · a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing, this alone allows for detection of: oa failed or hung QEMU/KVM instance, or oa failed or hung VM’s OS, or oa failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or oa failure of the VM to route basic IO via linux sockets. · I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM ( https://libvirt.org/formatdomain.html#elementsWatchdog ) · However, the VM Heartbeat / Health-check Monitoring o provides a higher-level (i.e. application-level) heartbeating § i.e. if the Heartbeat requests are being answered by the Application running within the VM o provides more than just heartbeating, as the Application can use it to trigger a variety of audits, o provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud, o provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Vitrage § e.g. VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager - (StateChange) - Nova - ... - VNF Manager Greg. From: Adam Spiers <aspi...@suse.com<mailto:aspi...@suse.com>> Reply-To: "openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" <openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>> Date: Tuesday, May 16, 2017 at 7:29 PM To: "openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" <openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring Waines, Greg <greg.wai...@windriver.com<mailto:greg.wai...@windriver.com><mailto:greg.wai...@windriver.com><mailto:greg.wai...@windriver.com%3e>> wrote: thanks for the pointers Sam. I took a quick look. I agree that the VM Heartbeat / Health-check looks like a good fit into Masakari. Currently your instance monitoring looks like it is strictly black-box type monitoring thru libvirt events. Is that correct ? i.e. you do not do any intrusive type monitoring of the instance thru the QUEMU Guest Agent facility correct ? That is correct: https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py I think this is what VM Heartbeat / Health-check would add to Masaraki. Let me kn
Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
Thanks for the clarification Greg. This sounds like it has the potential to be a very useful capability. May I suggest that you propose a new user story for it, along similar lines to this existing one? http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html Waines, Greg <greg.wai...@windriver.com> wrote: Yes that’s correct. VM Heartbeating / Health-check Monitoring would introduce intrusive / white-box type monitoring of VMs / Instances. I realize this is somewhat in the gray-zone of what a cloud should be monitoring or not, but I believe it provides an alternative for Applications deployed in VMs that do not have an external monitoring/management entity like a VNF Manager in the MANO architecture. And even for VMs with VNF Managers, it provides a highly reliable alternate monitoring path that does not rely on Tenant Networking. You’re correct, that VM HB/HC Monitoring would leverage https://wiki.libvirt.org/page/Qemu_guest_agent that would require the agent to be installed in the images for talking back to the compute host. ( there are other examples of similar approaches in openstack ... the murano-agent for installation, the swift-agent for object store management ) Although here, in the case of VM HB/HC Monitoring, via the QEMU Guest Agent, the messaging path is internal thru a QEMU virtual serial device. i.e. a very simple interface with very few dependencies ... it’s up and available very early in VM lifecycle and virtually always up. Wrt failure modes / use-cases · a VM’s response to a Heartbeat Challenge Request can be as simple as just ACK-ing, this alone allows for detection of: oa failed or hung QEMU/KVM instance, or oa failed or hung VM’s OS, or oa failure of the VM’s OS to schedule the QEMU Guest Agent daemon, or oa failure of the VM to route basic IO via linux sockets. · I have had feedback that this is similar to the virtual hardware watchdog of QEMU/KVM ( https://libvirt.org/formatdomain.html#elementsWatchdog ) · However, the VM Heartbeat / Health-check Monitoring o provides a higher-level (i.e. application-level) heartbeating § i.e. if the Heartbeat requests are being answered by the Application running within the VM o provides more than just heartbeating, as the Application can use it to trigger a variety of audits, o provides a mechanism for the Application within the VM to report a Health Status / Info back to the Host / Cloud, o provides notification of the Heartbeat / Health-check status to higher-level cloud entities thru Vitrage § e.g. VM-Heartbeat-Monitor - to - Vitrage - (EventAlarm) - Aodh - ... - VNF-Manager - (StateChange) - Nova - ... - VNF Manager Greg. From: Adam Spiers <aspi...@suse.com> Reply-To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Date: Tuesday, May 16, 2017 at 7:29 PM To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring Waines, Greg <greg.wai...@windriver.com<mailto:greg.wai...@windriver.com>> wrote: thanks for the pointers Sam. I took a quick look. I agree that the VM Heartbeat / Health-check looks like a good fit into Masakari. Currently your instance monitoring looks like it is strictly black-box type monitoring thru libvirt events. Is that correct ? i.e. you do not do any intrusive type monitoring of the instance thru the QUEMU Guest Agent facility correct ? That is correct: https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py I think this is what VM Heartbeat / Health-check would add to Masaraki. Let me know if you agree. OK, so you are looking for something slightly different I guess, based on this QEMU guest agent? https://wiki.libvirt.org/page/Qemu_guest_agent That would require the agent to be installed in the images, which is extra work but I imagine quite easily justifiable in some scenarios. What failure modes do you have in mind for covering with this approach - things like the guest kernel freezing, for instance? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org<mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___
Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
Yep :-) That's pretty much exactly what I was suggesting elsewhere in this thread: http://lists.openstack.org/pipermail/openstack-dev/2017-May/116748.html Waines, Greg <greg.wai...@windriver.com> wrote: Excellent. Yeah I just watched your Boston Summit presentation and noticed, at least when you were talking about host-monitoring, you were looking at having alternative backends for reporting e.g. to masakari-api or to mistral or ... to Vitrage :) Greg. From: Adam Spiers <aspi...@suse.com> Reply-To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Date: Tuesday, May 16, 2017 at 7:42 PM To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org> Subject: Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring Waines, Greg <greg.wai...@windriver.com<mailto:greg.wai...@windriver.com>> wrote: Sam, Two other more higher-level points I wanted to discuss with you about Masaraki. First, so I notice that you are doing both monitoring, auto-recovery and even host maintenance type functionality as part of the Masaraki architecture. are you open to some configurability (enabling/disabling) of these capabilities ? I can't speak for Sampath or the Masakari developers, but the monitors are standalone components. Currently they can only send notifications in a format which the masakari-api service can understand, but I guess it wouldn't be hard to extend them to send notifications in other formats if that made sense. e.g. OPNFV guys would NOT want auto-recovery, they would prefer that fault events get reported to Vitrage ... and eventually filter up to Aodh Alarms that get received by VNFManagers which would be responsible for the recovery. e.g. some deployers of openstack might want to disable parts or all of your monitoring, if using other mechanisms such as Zabbix or Nagios for the host monitoring (say) Yes, exactly! This kind of configurability and flexibility which would allow each cloud architect to choose which monitoring / alerting / recovery components suit their requirements best in a "mix'n'match" fashion, is exactly what we are aiming for with our modular approach to the design of compute plane HA. If the various monitoring components adopt a driver-based approach to alerting and/or the ability to alert via a lowest common denominator format such as simple HTTP POST of JSON blobs, then it should be possible for each cloud deployer to integrate the monitors with whichever reporting dashboards / recovery workflow controllers best satisfy their requirements. Second, are you open to configurably having fault events reported to Vitrage ? Again I can't speak on behalf of the Masakari project, but this sounds like a great idea to me :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org<mailto:openstack-dev-requ...@lists.openstack.org>?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
Waines, Gregwrote: Sam, Two other more higher-level points I wanted to discuss with you about Masaraki. First, so I notice that you are doing both monitoring, auto-recovery and even host maintenance type functionality as part of the Masaraki architecture. are you open to some configurability (enabling/disabling) of these capabilities ? I can't speak for Sampath or the Masakari developers, but the monitors are standalone components. Currently they can only send notifications in a format which the masakari-api service can understand, but I guess it wouldn't be hard to extend them to send notifications in other formats if that made sense. e.g. OPNFV guys would NOT want auto-recovery, they would prefer that fault events get reported to Vitrage ... and eventually filter up to Aodh Alarms that get received by VNFManagers which would be responsible for the recovery. e.g. some deployers of openstack might want to disable parts or all of your monitoring, if using other mechanisms such as Zabbix or Nagios for the host monitoring (say) Yes, exactly! This kind of configurability and flexibility which would allow each cloud architect to choose which monitoring / alerting / recovery components suit their requirements best in a "mix'n'match" fashion, is exactly what we are aiming for with our modular approach to the design of compute plane HA. If the various monitoring components adopt a driver-based approach to alerting and/or the ability to alert via a lowest common denominator format such as simple HTTP POST of JSON blobs, then it should be possible for each cloud deployer to integrate the monitors with whichever reporting dashboards / recovery workflow controllers best satisfy their requirements. Second, are you open to configurably having fault events reported to Vitrage ? Again I can't speak on behalf of the Masakari project, but this sounds like a great idea to me :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
Waines, Gregwrote: thanks for the pointers Sam. I took a quick look. I agree that the VM Heartbeat / Health-check looks like a good fit into Masakari. Currently your instance monitoring looks like it is strictly black-box type monitoring thru libvirt events. Is that correct ? i.e. you do not do any intrusive type monitoring of the instance thru the QUEMU Guest Agent facility correct ? That is correct: https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py I think this is what VM Heartbeat / Health-check would add to Masaraki. Let me know if you agree. OK, so you are looking for something slightly different I guess, based on this QEMU guest agent? https://wiki.libvirt.org/page/Qemu_guest_agent That would require the agent to be installed in the images, which is extra work but I imagine quite easily justifiable in some scenarios. What failure modes do you have in mind for covering with this approach - things like the guest kernel freezing, for instance? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [vitrage] [nova] [HA] VM Heartbeat / Healthcheck Monitoring
Afek, Ifat (Nokia - IL/Kfar Sava)wrote: On 16/05/2017, 4:36, "Sam P" wrote: Hi Greg, In Masakari [0] for VMHA, we have already implemented some what similar function in masakri-monitors. Masakari-monitors runs on nova-compute node, and monitors the host, process or instance failures. Masakari instance monitor has similar functionality with what you have described. Please see [1] for more details on instance monitoring. [0] https://wiki.openstack.org/wiki/Masakari [1] https://github.com/openstack/masakari-monitors/tree/master/masakarimonitors/instancemonitor Once masakari-monitors detect failures, it will send notifications to masakari-api to take appropriate recovery actions to recover that VM from failures. You can also find out more about our architectural plans by watching this talk which Sampath and I gave in Boston: https://www.openstack.org/videos/boston-2017/high-availability-for-instances-moving-to-a-converged-upstream-solution The slides are here: https://aspiers.github.io/openstack-summit-2017-boston-compute-ha/ We didn't go into much depth on monitoring and recovery of individual VMs, but as Sampath explained, Masakari already handles both of these. Hi Greg, Sam, As Vitrage is about correlating alarms that come from different sources, and is not a monitor by itself – I think that it can benefit from information retrieved by both Masakari and Zabbix monitors. Zabbix is already integrated into Vitrage. I don’t know if there are specific tests for VM heartbeat, but I think it is very likely that there are. Regarding Masakari – looking at your documents, I believe that integrating your monitoring information into Vitrage could be quite straight forward. Yes, this makes sense. Masakari already cleanly decouples monitoring/alerting from automated recovery, so it could support this quite nicely. And the modular converged architecture we explained in the presentation will maintain that clean separation of responsibilities whilst integrating Masakari together with other components such as Pacemaker, Mistral, and maybe Vitrage too. For example whilst so far this thread has been about VM instance monitoring, another area where Vitrage could integrate with Masakari is compute host monitoring. If you watch this part of our presentation where we explained the next generation architecture, you'll see that we propose a new "nova-host-alerter" component which has a driver-based mechanism for alerting different services when a compute host experiences a failure: https://youtu.be/YPKE1guti8E?t=32m43s So one obvious possibility would be to add a driver for Vitrage, so that Vitrage can be alerted when Pacemaker spots a host failure. Similarly, we could extend Pacemaker configurations to alert Vitrage when individual processes such as nova-compute or libvirtd fail. If you would like to discuss any of this further or have any more questions, in addition to this mailing list we are also available to talk on the #openstack-ha IRC channel! Cheers, Adam P.S. I've added the [HA] badge to this thread since this discussion is definitely related to high availability. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] follow-up from HA discussion at Boston Forum
Hi all, Sam Pwrote: This is a quick reminder for HA Forum session at Boston Summit. Thank you all for your comments and effort to make this happen in Boston Summit. Time: Thu 11 , 11:00am-11:40am Location: Hynes Convention Center - Level One - MR 103 Etherpad: https://etherpad.openstack.org/p/BOS-forum-HA-in-openstack Please join and let's discuss the HA issues in OpenStack... --- Regards, Sampath Thanks to everyone who came to the High Availability Forum session in Boston last week! To me, the great turn-out proved that there is enough general interest in HA within OpenStack to justify allocating space for dicussion on those topics not only at each summit, but in between the summits too. To that end, I'd like to a) remind everyone of the weekly HA IRC meetings: https://wiki.openstack.org/wiki/Meetings/HATeamMeeting and also b) highlight an issue that we most likely need to solve: currently these weekly IRC meetings are held at 0900 UTC on Wednesday: http://eavesdrop.openstack.org/#High_Availability_Meeting which is pretty much useless for anyone in the Americas. This time was previously chosen because the most regular attendees were based in Europe or Asia, but I'm now looking for suggestions on how to make this fairer for all continents. Some options: - Split the 60 minutes in half, and hold two 30 minute meetings each week at different times, so that every timezone has convenient access to at least one of them. - Alternate the timezone every other week. This might make it hard to build any kind of momentum. - Hold two meetings each week. I'm not sure we'd have enough traffic to justify this, but we could try. Any opinions, or better ideas? Thanks! Adam P.S. Big thanks to Sampath for organising the Boston Forum session and managing to attract such a healthy audience :-) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] OpenStack-Ansible HA solution
Wei Huiwrote: Liyuenan (Maxwell Li) wrote: Hi, all I have some questions about OSA project. [snipped] 2. Could OSA support compute node high available? If my compute node down, could the instance on this node move to other nodes? 2. As far as I know, OSA don't support compute node HA. nova has a feature called evacuate, but some machenism needed to detect whether nova-compute was down and trigger evacuate. If you want to find out more about compute node HA you might be interested in our upcoming presentation in Boston: https://www.openstack.org/summit/boston-2017/summit-schedule/events/17971/high-availability-for-instances-moving-to-a-converged-upstream-solution Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Alternative approaches for L3 HA
Anil Venkata <anilvenk...@redhat.com> wrote: > On Thu, Feb 23, 2017 at 12:10 AM, Miguel Angel Ajo Pelayo > <majop...@redhat.com> wrote: > > On Wed, Feb 22, 2017 at 11:28 AM, Adam Spiers <aspi...@suse.com> wrote: > >> With help from others, I have started an analysis of some of the > >> different approaches to L3 HA: > >> > >> https://ethercalc.openstack.org/Pike-Neutron-L3-HA > >> > >> (although I take responsibility for all mistakes ;-) > > Did you test with this patch https://review.openstack.org/#/c/255237/ ? It > was merged in newton cycle. > With this patch, HA+L2pop doesn't depend on control plane during fail over, > hence failover should be faster(same as without l2pop). Thanks Anil! I've updated the spreadsheet to take this into account. > >> It would be great if someone from RH or RDO could provide information > >> on how this RDO (and/or RH OSP?) solution based on Pacemaker + > >> keepalived works - if so, I volunteer to: > >> > >> - help populate column E of the above sheet so that we can > >> understand if there are still remaining gaps in the solution, and > >> > >> - document it (e.g. in the HA guide). Even if this only ended up > >> being considered as a shorter-term solution, I think it's still > >> worth documenting so that it's another option available to > >> everyone. > >> > >> Thanks! > > > I have updated the spreadsheet. Thanks a lot Miguel and everyone else who contributed to the spreadsheet so far! After a very productive meeting this morning at the PTG, I think it is quite close to completion now, and I am already working with the docs team on moving it into official documentation, either in the HA Guide (which I am trying to help maintain) or the Networking Guide. I don't have strong opinions on where it should live - if anyone does then please let us know now. I also attempted to write up a mini-report summarising this morning's meeting for future reference; it's (currently) at line 279 onwards of: https://etherpad.openstack.org/p/neutron-ptg-pike-final but I'll reproduce it here for convenience. The conclusion, at least as I understand it, is as follows: - The l3_ha solution is already working pretty well in many deployments, especially when coupled with a few extra benefits from Pacemaker (although https://bugs.launchpad.net/neutron/+bugs?field.tag=l3-ha might suggest otherwise ...) - Some more refinements to this solution could be made to reduce the remaining corner cases where failures are not handled well. - I (and hopefully others) will work towards documenting this solution in more detail. - In the mean time, Ann Taraday and anyone else interested may continue out-of-tree experiments with different architectures such as tooz/etcd. It is expected that these would be invasive changes, possibly taking at least 1-2 release cycles to stabilise, but they might still be worth it. - If a PoC is submitted for review and looks promising, we can decide whether it makes sense to aim to replace the existing keepalived solution, or instead offer it as an alternative by introducing pluggable L3 drivers. However, adding a driver abstraction layer would also be costly and expand the test matrix, at a time where developer resources are scarce. So there would need to be a compelling reason to do this. I hope that's a reasonably accurate representation of the outcome from this morning - obviously feel free to submit comments if I missed or mistook anything. Thanks for a great meeting! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Alternative approaches for L3 HA
Kosnik, Luboszwrote: > About success of RDO we need to remember that this deployment utilizes > Peacemaker and when I was working on this feature and even I spoke with Assaf > this external application was doing everything to make this solution working. > Peacemaker was responsible for checking external and internal connectivity. > To detect split brain. Elect master, even keepalived was running but > Peacemaker was automatically killing all services and moving FIP. > Assaf - is there any change in this implementation in RDO? Or you’re still > doing everything outside of Neutron? > > Because if RDO success is build on Peacemaker it means that yes, Neutron > needs some solution which will be available for more than RH deployments. Agreed. With help from others, I have started an analysis of some of the different approaches to L3 HA: https://ethercalc.openstack.org/Pike-Neutron-L3-HA (although I take responsibility for all mistakes ;-) It would be great if someone from RH or RDO could provide information on how this RDO (and/or RH OSP?) solution based on Pacemaker + keepalived works - if so, I volunteer to: - help populate column E of the above sheet so that we can understand if there are still remaining gaps in the solution, and - document it (e.g. in the HA guide). Even if this only ended up being considered as a shorter-term solution, I think it's still worth documenting so that it's another option available to everyone. Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack-operators] Destructive / HA / fail-over scenarios
Timur Nurlygayanovwrote: > Hi OpenStack developers and operators, > > we are going to create the test suite for destructive testing of > OpenStack clouds. We want to hear your feedback and ideas > about possible destructive and failover scenarios which we need > to check. > > Which scenarios we need to check if we want to make sure that > some OpenStack cluster is configured in High Availability mode > and can be published as a "production/enterprise" cluster. > > Your ideas are welcome, let's discuss the ideas of test scenarios in > this email thread. I applaud the effort to boost automated testing of failure scenarios! And thanks a lot for polling the list before starting any work on this. Regarding the implementation, did you consider reusing Cloud 99, and if not, please could you? :-) Obviously it would be good to avoid reinventing wheels where possible. https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/high-availability-and-resiliency-testing-strategies-for-openstack-clouds https://github.com/cisco-oss-eng/Cloud99 If there are some gaps between Cloud99 and what is needed then it would be worth evaluating them in order to determine whether it makes sense to start from scratch versus simply develop Cloud99 further. Also it would be great if you could join the #openstack-ha IRC channel where you will find friendly folks from the broader OpenStack HA sub-community who I'm sure will be happy to discuss this further. You are also very welcome to join our weekly IRC meetings: https://wiki.openstack.org/wiki/Meetings/HATeamMeeting Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack] The possible ways of high availability for non-cloud-ready apps running on openstack
Hi Hossein, hossein zabolzadehwrote: > Hi there. > I am dealing with large amount of legacy application(MediaWiki, Joomla, > ...) running on openstack. I am looking for the best way to improve high > availability of my instances. All applications are not designed for > fail(Non-Cloud-Ready Apps). So, what is the best way of improving HA on my > non-clustered instances(Stateful Instances)? > Thanks in advance. Sorry for the slow reply - I only just noticed this. Please see this talk I gave in Austin with Dawid Deja: https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation I believe it should answer your question in a lot of detail, but please let me know if you have follow-up questions. Regards, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [HA] RFC: High Availability track at future Design Summits
Hi Thierry, Thierry Carrez <thie...@openstack.org> wrote: > Adam Spiers wrote: > > I doubt anyone would dispute that High Availability is a really > > important topic within OpenStack, yet none of the OpenStack > > conferences or Design Summits so far have provided an "official" track > > or similar dedicated space for discussion on HA topics. > > [...] > > We do not provide a specific track at the "Design Summit" for HA (or for > hot upgrades for the matter) but we have space for "cross-project > workshops" in which HA topics would be discussed. I suspect what you > mean here is that the one of two sessions that the current setup allows > are far from enough to tackle that topic efficiently ? Yes, I think that's probably true. I get the impression cross-project workshops are intended more for coordination of common topics between many official big tent projects, whereas our topics typically involve a small handful of projects, some of which are currently unofficial. > IMHO there is dedicated space -- just not enough of it. It's one of the > issues with the current Design Summit setup -- just not enough time and > space to tackle everything in one week. With the new event format I > expect we'll be able to free up more time to discuss such horizontal > issues Right. I'm looking forward to the new format :-) > but as far as Barcelona goes (where we have less space and less > time than in Austin), I'd encourage you to still propose cross-project > workshops (and engage on the Ops side of the Design Summit to get > feedback from there as well). OK thanks - I'll try to figure out the best way of following up on these two points. I see that https://wiki.openstack.org/wiki/Design_Summit/Ocata/Etherpads is still empty, so I guess we're still on the early side of planning for design summit tracks, which hopefully means there's still time to propose a fishbowl session for Ops feedback on HA. Thanks a lot for the advice! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] RFC: High Availability track at future Design Summits
Hi all, I doubt anyone would dispute that High Availability is a really important topic within OpenStack, yet none of the OpenStack conferences or Design Summits so far have provided an "official" track or similar dedicated space for discussion on HA topics. This is becoming increasingly problematic as the number of HA topics increase. For example, in Austin a group of us spent something like 15 hours together over 3-4 days for design sessions around the future of HA for the compute plane. This is not by any means the only HA topic which needs discussing. Other possible topics: - Input from operators on their experiences of deployment, maintenance, and effectiveness of highly available OpenStack infrastructure - Adding or improving HA support in existing projects, e.g. - cinder-volume active/active work is currently ongoing - neutron always has ongoing HA topics - the hot one in Austin seemed to be HA+DVR+SNAT. - We had some great discussions with the Congress team in Austin, which may need follow-up. - mistral is involved in ongoing HA work. - The various projects playing on the HA scene (Senlin is another example) need the opportunity to sync up with each other to become aware of any opportunities for integration or potential overlap. - Documentation (the official HA guide) - Different / new approaches to HA of the control plane (e.g. Pacemaker vs. systemd vs. other clustering technologies) - Testing and hardening of existing HA architectures (e.g. via projects such as cloud99) Whilst we do have the #openstack-ha IRC channel, weekly IRC meetings, and of course the mailing lists, I think it would be helpful to have an official space in the design summits for continuation of those technical discussions face-to-face. Granted, some of the above topics could be discussed in the related project track (cinder, neutron, congress, documentation etc.). But this does not provide a forum for detailed technical discussion on cross-project initiatives such as compute HA, or architectural debates which don't relate to a single project, or work on HA projects which don't have their own dedicated track in the Design Summit. Therefore I would like to propose that future Design Summits adopt an official HA "mini-track" (I guess one day might be sufficient), and I'd really appreciate hearing opinions on this proposal. Also the idea meets enough favour, it would be useful to find it whether it's already too late to arrange this for Barcelona :-) Thanks a lot! Adam P.S. Maybe a similar proposal on a smaller scale would be valid for some of the operator and regional meetups too? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] weekly High Availability meetings on IRC: change of time
Hi everyone, I have proposed moving the weekly High Availability IRC meetings one hour later, back to the original time of 0900 UTC every Monday. https://review.openstack.org/#/c/349601/ Everyone is welcome to attend these meetings, so if you think you are likely to regularly attend, feel free to vote on that review. Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] RFC: user story including hypervisor reservation / host maintenance / storage AZs / event history
[Cc'ing product-wg@ - when replying, first please consider whether cross-posting is appropriate] Hi all, Currently the OpenStack HA community is putting a lot of effort into converging on a single upstream solution for high availability of VMs and hypervisors[0], and we had a lot of very productive discussions in Austin on this topic[1]. One of the first areas of focus is the high level user story: http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/ha_vm.html In particular, there is an open review on which we could use some advice from the wider community. The review proposes adding four extra usage scenarios to the existing user story. All of these scenarios are to some degree related to HA of VMs and hypervisors, however none of them exclusively - they all have scope extending to other areas beyond HA. Here's a very brief summary of all four, as they relate to HA: 1. "Sticky" shared storage zones Scenario: all compute hosts have access to exactly one shared storage "availability zone" (potentially independent of the normal availability zones). For example, there could be multiple NFS servers, and every compute host has /var/lib/nova/instances mounted to one of them. On first boot, each VM is *implicitly* assigned to a zone, depending on which compute host nova-scheduler picks for it (so this could be more or less random). Subsequent operations such as "nova evacuate" would need to ensure the VM only ever moves to other hosts in the same zone. 2. Hypervisor reservation The operator wants a mechanism for reserving some compute hosts exclusively for use as failover hosts on which to automatically resurrect VMs from other failed compute nodes. 3. Host maintenance The operator wants a mechanism for flagging hosts as undergoing maintenance, so that the HA mechanisms for automatic recovery are temporarily disabled during the maintenance window. 4. Event history The operator wants a way to retrieve the history of what, when, where and how the HA automatic recovery mechanism is performed. And here's the review in question: https://review.openstack.org/#/c/318431/ My first instinct was that all of these scenarios are sufficiently independent, complex, and extend far enough outside HA scope, that they deserve to live in four separate user stories, rather than adding them to our existing "HA for VMs" user story. This could also maximise the chances of converging on a single upstream solution for each which works both inside and outside HA contexts. (Please read the review's comments for much more detail on these arguments.) However, others made the very valid point that since there are elements of all these stories which are indisputably related to HA for VMs, we still need the existing user story for HA VMs to cover them, so that it can provide "the big picture" which will tie together all the different strands of work it requires. So we are currently proposing to take the following steps: - Propose four new user stories for each of the above scenarios. - Link to the new stories from the "Related User Stories" section of the existing HA VMs story. - Extend the existing story so that it covers the HA-specific aspects of the four cases, leaving any non-HA aspects to be covered by the newly linked stories. Then each story would go through the standard workflow defined by the PWG: https://wiki.openstack.org/wiki/ProductTeam/User_Stories Does this sound reasonable, or is there a better way? BTW, whilst this email is primarily asking for advice on the process, feedback on each story is also welcome, whether it's "good idea", "you can already do that", or "terrible idea!" ;-) However please first read the comments on the above review, as the obvious points have probably already been covered :-) Thanks a lot! Adam [0] A complete description of the problem area and existing solutions was given in this talk: https://www.openstack.org/videos/video/high-availability-for-pets-and-hypervisors-state-of-the-nation [1] https://etherpad.openstack.org/p/newton-instance-ha __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [HA] About next HA Team meeting (VoIP)
Sam Pwrote: > Hi All, > > In today's ( 9th May 2016) meeting we agree to skip the next IRC > meeting (which is 16th May 2016) in favour of a gotomeeting VoIP on > 18th May 2016 (Wednesday). > Today's meeting logs and summary can be found here. > http://eavesdrop.openstack.org/meetings/ha/2016/ha.2016-05-09-08.04.html > > About the meeting Time: > Every one was convenient with 8:00am UTC. > However due to some resource allocation issues, I would like to shift > this VoIP meeting to > 9am UTC 18th May 2016 > > Please let me know if you are convenient or not with this time slot. That later time is fine for me :) Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [HA] weekly High Availability meetings on IRC start next Monday
Sergii Golovatiukwrote: > > > [2] declares meetings at 9am UTC which might be tough for US based > > folks. I > > > might be wrong here as I don't know the location of HA experts. > > > > > > [2] http://eavesdrop.openstack.org/#High_Availability_Meeting > > > > Yes, I was aware of this :-/ The problem is that the agenda for the > > first meeting will focus on hypervisor HA, and the interested parties > > who met in Tokyo are all based in either Europe or Asia (Japan and > > Australia). It's hard but possible to find a time which accommodates > > two continents, but almost impossible to find a time which > > accommodates three :-/ > > > > > I ran into issues setting event in UTC. Use Ghana/Accra in google calendar > as it doesn't have UTC time zone Speaking on behalf of my home town, United Kingdom/London would also work ;-) But even easier, just add this URL to your Google Calendar: http://eavesdrop.openstack.org/calendars/high-availability-meeting.ics or if you want to really spam your calendar, you can add all OpenStack meetings in one go :-) http://eavesdrop.openstack.org/irc-meetings.ical __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [HA] weekly High Availability meetings on IRC start next Monday
Hi Sergii, Thanks a lot for the feedback! Sergii Golovatiukwrote: > Hi Adam, > > It's great we are moving forward with HA community. Thank you so much for > brining HA to next level. However, I have couple of comments > > [1] contains agenda. I guess we should move it to > https://etherpad.openstack.org. That will allow people to add own topics to > discuss. Action items can be put there as well. > > [1] https://wiki.openstack.org/wiki/Meetings/HATeamMeeting It's a wiki, so anyone can already add their own topics, and in fact the page already encourages people to do that :-) I'd prefer to keep it as a wiki because that is consistent with the approach of all the other OpenStack meetings, as recommended by https://wiki.openstack.org/wiki/Meetings/CreateaMeeting#Add_a_Meeting It also results in a better audit trail than etherpad (where changes can be made anonymously). Action items will be captured by the MeetBot: https://git.openstack.org/cgit/openstack-infra/meetbot/tree/doc/Manual.txt > [2] declares meetings at 9am UTC which might be tough for US based folks. I > might be wrong here as I don't know the location of HA experts. > > [2] http://eavesdrop.openstack.org/#High_Availability_Meeting Yes, I was aware of this :-/ The problem is that the agenda for the first meeting will focus on hypervisor HA, and the interested parties who met in Tokyo are all based in either Europe or Asia (Japan and Australia). It's hard but possible to find a time which accommodates two continents, but almost impossible to find a time which accommodates three :-/ If it's any consolation, the meeting logs will be available afterwards, and also the meeting is expected to be short (around 30 minutes) since the majority of work will continue via email and IRC outside the meeting. This first meeting is mainly to set a direction for future collaboration. However suggestions for how to handle this better are always welcome, and if the geographical distribution of attendees of future meetings changes then of course we can consider changing the time in order to accommodate. I want it to be an inclusive sub-community. Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] ANNOUNCE: new "[HA]" topic category in mailman configuration
Hi all, As you may know, Mailman allows server-side filtering of mailing list traffic by topic categories: http://lists.openstack.org/cgi-bin/mailman/options/openstack-dev (N.B. needs authentication) Thierry has kindly added "[HA]" as a new topic category in the mailman configuration for this list, so please tag all mails related to High Availability with this prefix so that it can be detected by both server-side and client-side mail filters belonging to people interested in HA discussions. Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] weekly High Availability meetings on IRC start next Monday
Hi all, After some discussion in Tokyo by stakeholders in OpenStack High Availability, I'm pleased to announce that from next Monday we're starting a series of weekly meetings on IRC. Details are here: https://wiki.openstack.org/wiki/Meetings/HATeamMeeting http://eavesdrop.openstack.org/#High_Availability_Meeting The agenda for the first meeting is set and will focus on 1. the pros and cons of the existing approaches to hypervisor HA which rely on automatic resurrection[0] of VMs, and 2. how we might be able to converge on a best-of-breed solution. All are welcome to join! On a related note, even if you can't attend the meeting, you can still use the new FreeNode IRC channel #openstack-ha for all HA-related discussion. Cheers, Adam [0] In the OpenStack community resurrection is commonly referred to as "evacuation", which is a slightly unfortunate misnomer. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ANNOUNCE] [HA] new #openstack-ha IRC channel on FreeNode
Sorry! It would have helped if I'd used the right address for the openstack list in the To: and Reply-To: headers :-/ Hopefully second time lucky ... Adam Spiers <aspi...@suse.com> wrote: > [cross-posting to several lists; please trim the recipients list > before replying!] > > Hi all, > > After discussion with members of the openstack-infra team, I > registered new FreeNode IRC channel #openstack-ha. Discussion on all > aspects of OpenStack High Availability is welcome in this channel. > Hopefully it will help promote cross-pollination of ideas and maybe > even more convergence on upstream solutions. > > I added it to https://wiki.openstack.org/wiki/IRC and also set up the > gerritbot to auto-announce activity for the "new" > openstack-resource-agents repository which I announced separately > yesterday: > > http://lists.openstack.org/pipermail/openstack-dev/2015-October/077601.html > > Still TODO: set up eavesdrop to record channel logs: > > https://review.openstack.org/#/c/237341/ > > Cheers, > Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ANNOUNCE] [HA] new #openstack-ha IRC channel on FreeNode
[cross-posting to several lists; please trim the recipients list before replying!] Hi all, After discussion with members of the openstack-infra team, I registered new FreeNode IRC channel #openstack-ha. Discussion on all aspects of OpenStack High Availability is welcome in this channel. Hopefully it will help promote cross-pollination of ideas and maybe even more convergence on upstream solutions. I added it to https://wiki.openstack.org/wiki/IRC and also set up the gerritbot to auto-announce activity for the "new" openstack-resource-agents repository which I announced separately yesterday: http://lists.openstack.org/pipermail/openstack-dev/2015-October/077601.html Still TODO: set up eavesdrop to record channel logs: https://review.openstack.org/#/c/237341/ Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ANNOUNCE] [HA] [Pacemaker] new, maintained openstack-resource-agents repository
[cross-posting to openstack-dev and pacemaker user lists; please consider trimming the recipients list if your reply is not relevant to both communities] Hi all, Back in June I proposed moving the well-used but no longer maintained https://github.com/madkiss/openstack-resource-agents/ repository to Stackforge: http://lists.openstack.org/pipermail/openstack-dev/2015-June/067763.html https://github.com/madkiss/openstack-resource-agents/issues/22 The responses I got were more or less unanimously in favour, so I'm simultaneously pleased and slightly embarrassed to announce that 4 months later, I've finally followed up on my proposal: https://launchpad.net/openstack-resource-agents https://git.openstack.org/cgit/openstack/openstack-resource-agents/ https://review.openstack.org/#/admin/projects/openstack/openstack-resource-agents https://review.openstack.org/#/q/status:open+project:openstack/openstack-resource-agents,n,z Since June, Stackforge has been retired, so as you can see above, this repository lives under the 'openstack' namespace. I volunteered to be a maintainer and there were no objections. I sent out an initial call for co-maintainers but noone expressed an interest which is probably fine because the workload is likely to be quite light. However if you'd like to be involved please drop me a line. I've also taken care of outstanding pull requests and bug reports against the old repository, and providing a redirect from the old repository's README to the new one. Still TODO: adding this repository to the Big Tent. I've had some discussions with the openstack-infra team about that, since there is not currently a suitable project team to create it under. We might create a new project team called "OpenStack Pacemaker" or similar, and place it under that. ("OpenStack HA" would be far too broad to be able to find a single PTL.) However there is no rush for this, and it has been suggested that it would not be a bad thing to wait for the "new" project to stabilise and prove its longevity before making it official. Cheers, Adam P.S. I'll be in Tokyo if anyone wants to meet there and discuss further. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [HA] RFC: moving Pacemaker openstack-resource-agents to stackforge
[cross-posting to openstack-dev and pacemaker lists; please consider trimming the recipients list if your reply is not relevant to both communities] Hi all, https://github.com/madkiss/openstack-resource-agents/ is a nice repository of Pacemaker High Availability resource agents (RAs) for OpenStack, usage of which has been officially recommended in the OpenStack High Availability guide. Here is one of several examples: http://docs.openstack.org/high-availability-guide/content/_add_openstack_identity_resource_to_pacemaker.html Martin Loschwitz, who owns this repository, has since moved away from OpenStack, and no longer maintains it. I recently proposed moving the repository to StackForge, and he gave his consent and in fact said that he had the same intention but hadn't got round to it: https://github.com/madkiss/openstack-resource-agents/issues/22#issuecomment-113386505 You can see from that same github issue that several key members of the OpenStack Pacemaker sub-community are all in favour. Therefore I am volunteering to do the move to StackForge. Another possibility would be to move each RA to its corresponding OpenStack project, although this makes a lot less sense to me, since it would require the core members of every OpenStack project to care enough about Pacemaker to agree to maintain an RA for it. This raises the question of maintainership. SUSE has a vested interest in these resource agents, so we would be happy to help maintain them. I believe Red Hat is also using these, so any volunteers from there or indeed anywhere else to co-maintain would be welcome. They are already fairly complete, and I don't expect there will be a huge amount of change. I'm probably getting ahead of myself, but the other big question is regarding CI. Currently there are no tests at all. Of course we could add bashate, and maybe even some functional tests, but ultimately some integration tests would be really nice. However for now I propose we focus on the move and defer CI work till later. Thoughts? Thanks! Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ClusterLabs] [HA] RFC: moving Pacemaker openstack-resource-agents to stackforge
Digimer li...@alteeve.ca wrote: Resending to the Cluster Labs mailing list, this list is deprecated Thanks, I only realised that after getting a deprecation warning :-( On 23/06/15 06:27 AM, Adam Spiers wrote: [cross-posting to openstack-dev and pacemaker lists; please consider trimming the recipients list if your reply is not relevant to both communities] Hi all, https://github.com/madkiss/openstack-resource-agents/ is a nice repository of Pacemaker High Availability resource agents (RAs) for OpenStack, usage of which has been officially recommended in the OpenStack High Availability guide. Here is one of several examples: http://docs.openstack.org/high-availability-guide/content/_add_openstack_identity_resource_to_pacemaker.html Martin Loschwitz, who owns this repository, has since moved away from OpenStack, and no longer maintains it. I recently proposed moving the repository to StackForge, and he gave his consent and in fact said that he had the same intention but hadn't got round to it: https://github.com/madkiss/openstack-resource-agents/issues/22#issuecomment-113386505 You can see from that same github issue that several key members of the OpenStack Pacemaker sub-community are all in favour. Therefore I am volunteering to do the move to StackForge. There is a CusterLabs group on github that most of the HA cluster projects have or are moving under. Why not use that? This question was asked and answered in the github issue: https://github.com/madkiss/openstack-resource-agents/issues/22#issuecomment-114147300 Cheers, Adam __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] tools for making upstreaming / backporting easier in git
Hi all, Back in April, I created some wrapper scripts around git-cherry(1) and git-notes(1), which can help when you have more than a trivial number of commits to upstream or backport from one branch to another. Since then I've improved these tools, and also written a higher-level CLI which should make the whole process pretty easy. Last week I finally finished a blog post with all the details: http://blog.adamspiers.org/2013/09/19/easier-upstreaming-with-git/ in which I demonstrate how to use the tools via an artificial example involving backporting some commits from Nova's master branch to its stable/grizzly release branch. These tools worked pretty well for me and my team on code outside OpenStack, but no doubt some people will have ideas how to improve them, or have different techniques for tackling the problem. Either way, I hope this is of interest, and I'd love to hear what people think! Cheers, Adam ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev