Re: [openstack-dev] [ironic] PTG Summary
Inline. On 03/12/2018 01:00 PM, Tim Bell wrote: My worry with re-running the burn-in every time we do cleaning is for resource utilisation. When the machines are running the burn-in, they're not doing useful physics so I would want to minimise the number of times this is run over the life time of a machine. You only have to run it every time if you put the step into automated cleaning. However, we also have manual cleaning, which is run explicitly. It may be possible to do something like the burn in with a dedicated set of steps but still use the cleaning state machine. Yep, this is what manual cleaning is about: an operator explicitly requests it with a given set of steps. See https://docs.openstack.org/ironic/latest/admin/cleaning.html#manual-cleaning Having a cleaning step set (i.e. burn-in means cpuburn,memtest,badblocks,benchmark) would make it more friendly for the administrator. Similarly, retirement could be done with additional steps such as reset2factory. ++ We may even add a reference set of clean steps to IPA, but we'll need your help implementing them. I am personally not familiar with how to do burn-in right (though IIRC Julia is). Tim -Original Message- From: Dmitry Tantsur Reply-To: "OpenStack Development Mailing List (not for usage questions)" Date: Monday, 12 March 2018 at 12:47 To: "openstack-dev@lists.openstack.org" Subject: Re: [openstack-dev] [ironic] PTG Summary Hi Tim, Thanks for the information. I personally don't see problems with cleaning running weeks, when needed. What I'd avoid is replicating the same cleaning machinery but with a different name. I think we should try to make cleaning work for this case instead. Dmitry On 03/12/2018 12:33 PM, Tim Bell wrote: > Julia, > > A basic summary of CERN does burn-in is at http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html > > Given that the burn in takes weeks to run, we'd see it as a different step to cleaning (with some parts in common such as firmware upgrades to latest levels) > > Tim > > -Original Message- > From: Julia Kreger > Reply-To: "OpenStack Development Mailing List (not for usage questions)" > Date: Thursday, 8 March 2018 at 22:10 > To: "OpenStack Development Mailing List (not for usage questions)" > Subject: [openstack-dev] [ironic] PTG Summary > > ... > Cleaning - Burn-in > > As part of discussing cleaning changes, we discussed supporting a > "burn-in" mode where hardware could be left to run load, memory, or > other tests for a period of time. We did not have consensus on a > generic solution, other than that this should likely involve > clean-steps that we already have, and maybe another entry point into > cleaning. Since we didn't really have consensus on use cases, we > decided the logical thing was to write them down, and then go from > there. > > Action Items: > * Community members to document varying burn-in use cases for > hardware, as they may vary based upon industry. > * Community to try and come up with a couple example clean-steps. > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ironic] PTG Summary
My worry with re-running the burn-in every time we do cleaning is for resource utilisation. When the machines are running the burn-in, they're not doing useful physics so I would want to minimise the number of times this is run over the life time of a machine. It may be possible to do something like the burn in with a dedicated set of steps but still use the cleaning state machine. Having a cleaning step set (i.e. burn-in means cpuburn,memtest,badblocks,benchmark) would make it more friendly for the administrator. Similarly, retirement could be done with additional steps such as reset2factory. Tim -Original Message- From: Dmitry Tantsur Reply-To: "OpenStack Development Mailing List (not for usage questions)" Date: Monday, 12 March 2018 at 12:47 To: "openstack-dev@lists.openstack.org" Subject: Re: [openstack-dev] [ironic] PTG Summary Hi Tim, Thanks for the information. I personally don't see problems with cleaning running weeks, when needed. What I'd avoid is replicating the same cleaning machinery but with a different name. I think we should try to make cleaning work for this case instead. Dmitry On 03/12/2018 12:33 PM, Tim Bell wrote: > Julia, > > A basic summary of CERN does burn-in is at http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html > > Given that the burn in takes weeks to run, we'd see it as a different step to cleaning (with some parts in common such as firmware upgrades to latest levels) > > Tim > > -Original Message- > From: Julia Kreger > Reply-To: "OpenStack Development Mailing List (not for usage questions)" > Date: Thursday, 8 March 2018 at 22:10 > To: "OpenStack Development Mailing List (not for usage questions)" > Subject: [openstack-dev] [ironic] PTG Summary > > ... > Cleaning - Burn-in > > As part of discussing cleaning changes, we discussed supporting a > "burn-in" mode where hardware could be left to run load, memory, or > other tests for a period of time. We did not have consensus on a > generic solution, other than that this should likely involve > clean-steps that we already have, and maybe another entry point into > cleaning. Since we didn't really have consensus on use cases, we > decided the logical thing was to write them down, and then go from > there. > > Action Items: > * Community members to document varying burn-in use cases for > hardware, as they may vary based upon industry. > * Community to try and come up with a couple example clean-steps. > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ironic] PTG Summary
Hi Tim, Thanks for the information. I personally don't see problems with cleaning running weeks, when needed. What I'd avoid is replicating the same cleaning machinery but with a different name. I think we should try to make cleaning work for this case instead. Dmitry On 03/12/2018 12:33 PM, Tim Bell wrote: Julia, A basic summary of CERN does burn-in is at http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html Given that the burn in takes weeks to run, we'd see it as a different step to cleaning (with some parts in common such as firmware upgrades to latest levels) Tim -Original Message- From: Julia Kreger Reply-To: "OpenStack Development Mailing List (not for usage questions)" Date: Thursday, 8 March 2018 at 22:10 To: "OpenStack Development Mailing List (not for usage questions)" Subject: [openstack-dev] [ironic] PTG Summary ... Cleaning - Burn-in As part of discussing cleaning changes, we discussed supporting a "burn-in" mode where hardware could be left to run load, memory, or other tests for a period of time. We did not have consensus on a generic solution, other than that this should likely involve clean-steps that we already have, and maybe another entry point into cleaning. Since we didn't really have consensus on use cases, we decided the logical thing was to write them down, and then go from there. Action Items: * Community members to document varying burn-in use cases for hardware, as they may vary based upon industry. * Community to try and come up with a couple example clean-steps. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ironic] PTG Summary
Julia, A basic summary of CERN does burn-in is at http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html Given that the burn in takes weeks to run, we'd see it as a different step to cleaning (with some parts in common such as firmware upgrades to latest levels) Tim -Original Message- From: Julia Kreger Reply-To: "OpenStack Development Mailing List (not for usage questions)" Date: Thursday, 8 March 2018 at 22:10 To: "OpenStack Development Mailing List (not for usage questions)" Subject: [openstack-dev] [ironic] PTG Summary ... Cleaning - Burn-in As part of discussing cleaning changes, we discussed supporting a "burn-in" mode where hardware could be left to run load, memory, or other tests for a period of time. We did not have consensus on a generic solution, other than that this should likely involve clean-steps that we already have, and maybe another entry point into cleaning. Since we didn't really have consensus on use cases, we decided the logical thing was to write them down, and then go from there. Action Items: * Community members to document varying burn-in use cases for hardware, as they may vary based upon industry. * Community to try and come up with a couple example clean-steps. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ironic] PTG Summary
The Ironic PTG Summary - The blur(b) from the East In an effort to provide visibility and awareness of all the things related to Ironic, I've typed up a summary below. I've tried to keep this fairly generalized with enough context and convey action items or the instances of consensus where applicable. It goes without saying that the week went by as a complete blur. We had to abruptly change our schedule around, some fine detailed topics were missed. A special thanks to Ruby Loo for taking some time to proof read this for me. -Julia - From our retrospective: As seems to be the norm with retrospectives, we did bring up a number of issues that slowed us down, hindered us, or hindered the ability to move faster. A great deal of this revolved around specifications, and the perceptions that tend to occur. Action Items: * Jroll will bring up for discussion if we can update the theme for rendered specs documentation to highlight that the specs are points in time references for design, and are not final documentation. * TheJulia will revise our specification template to attempt to be more clear about *why* we are asking the questions, also to suggest but not require proof of concept code After our retrospective, we spoke about things that can improve our velocity. This sort of discussion tends to always come up, and focused on community cultural aspects of revising/helping land code. The conclusion we quickly came to was that communication or context of the contributor is required. One of the points raised, that we did not get to, was that we should listen to contributor's perceptions, which really goes back to communication. As time went on, we shifted gears to a high level status of ironic, and there are some items to take away: * Inspector, at a high level, could use some additional work and contributors. Virtual media boot support would be helpful, and we may look at breaking some portions out and moving them into ironic. Additional High Availability work may be needed, at the same time it may not be needed. Entirely to be determined. * Ironic-ui presently has no active contributors, but is stable. Major risk right now is a breaking change coming from Horizon, which was also discussed earlier in the week with Horizon. Will add testing such that horizon's gate triggers ironic-ui testing and raises visibility to breaking changes. * Ironic itself got a lot completed this cycle, and we should expect quite a bit this cycle in terms of clean-up from deprecation. * Networking-baremetal received a good portion of work this cycle due to routed networks support. \o/ * Networking-generic-switch seems to be in a fairly stable state at this point. Some trunk awareness has been added, as well as some new switches and bug fixes. * Bifrost has low activity, but at the same time we're seeing new contributors fix issues or improve things, which is a good sign. * Sushy got authentication and introspection support added this cycle. We discussed that we may want to consider supporting RAID (in terms of client actions), as well as composable hardware. After statuses, we shifted into discussing the future. We started the entire discussion of the future with a visioning exercise to help frame the future, so we were all using the same words and had the same scope in mind when discussing the future of Ironic. One thing worth noting is upfront there was a lot of alignment, but we sometimes were just using slightly different words or concepts. Taking a little more time to reconcile those differences allowed us to relate additional words to the same meaning. Truly this set the stage for all of the other topics, and gave us the common reference point to grasp if what we were talking about made sense. Expect Jroll to send out an email to the mailing list to summarize this further, and from this initial discussion we will likely draft a formal vision document that will allow us to continue having the same reference point for discussions. Maybe one day your light bulb will be provisioned with Ironic! Deploy Steps In terms of the future, we again returned to the concept of breaking up deployments into a series of steps. Without going deep into detail, this is a very large piece of functionality and would help solve many problems and desires that exist today, especially where some operators wish things like deploy-time raid, or to flash firmware as part of the baremetal node provisioning process. This work is also influenced by traits, because traits can map to actions that need to be performed automatically. In the end, we agreed to take a small step, and iterate from there. Specifically adding a deploy steps framework and splitting our current deploy process into two logical steps. Location Awareness "Location awareness" as we are calling it, or possibly better stated as "conductor to node affinity" is a topic that we again revisited. This is important as many operators desire a single pane of glass for their enti