Re: [openstack-dev] [ironic] PTG Summary

2018-03-12 Thread Dmitry Tantsur

Inline.

On 03/12/2018 01:00 PM, Tim Bell wrote:

My worry with re-running the burn-in every time we do cleaning is for resource 
utilisation. When the machines are running the burn-in, they're not doing 
useful physics so I would want to minimise the number of times this is run over 
the life time of a machine.


You only have to run it every time if you put the step into automated cleaning. 
However, we also have manual cleaning, which is run explicitly.




It may be possible to do something like the burn in with a dedicated set of 
steps but still use the cleaning state machine.


Yep, this is what manual cleaning is about: an operator explicitly requests it 
with a given set of steps. See 
https://docs.openstack.org/ironic/latest/admin/cleaning.html#manual-cleaning




Having a cleaning step set (i.e. burn-in means 
cpuburn,memtest,badblocks,benchmark) would make it more friendly for the 
administrator. Similarly, retirement could be done with additional steps such 
as reset2factory.


++

We may even add a reference set of clean steps to IPA, but we'll need your help 
implementing them. I am personally not familiar with how to do burn-in right 
(though IIRC Julia is).




Tim

-Original Message-
From: Dmitry Tantsur <dtant...@redhat.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Monday, 12 March 2018 at 12:47
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [ironic] PTG Summary

 Hi Tim,
 
 Thanks for the information.
 
 I personally don't see problems with cleaning running weeks, when needed. What

 I'd avoid is replicating the same cleaning machinery but with a different 
name.
 I think we should try to make cleaning work for this case instead.
 
 Dmitry
 
 On 03/12/2018 12:33 PM, Tim Bell wrote:

 > Julia,
 >
 > A basic summary of CERN does burn-in is at 
http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html
 >
 > Given that the burn in takes weeks to run, we'd see it as a different 
step to cleaning (with some parts in common such as firmware upgrades to latest 
levels)
 >
 > Tim
 >
 > -Original Message-
 > From: Julia Kreger <juliaashleykre...@gmail.com>
 > Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
 > Date: Thursday, 8 March 2018 at 22:10
 > To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
 > Subject: [openstack-dev] [ironic] PTG Summary
 >
 > ...
 >  Cleaning - Burn-in
 >
 >  As part of discussing cleaning changes, we discussed supporting a
 >  "burn-in" mode where hardware could be left to run load, memory, or
 >  other tests for a period of time. We did not have consensus on a
 >  generic solution, other than that this should likely involve
 >  clean-steps that we already have, and maybe another entry point into
 >  cleaning. Since we didn't really have consensus on use cases, we
 >  decided the logical thing was to write them down, and then go from
 >  there.
 >
 >  Action Items:
 >  * Community members to document varying burn-in use cases for
 >  hardware, as they may vary based upon industry.
 >  * Community to try and come up with a couple example clean-steps.
 >
 >
 >
 > 
__
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 >
 
 
 __

 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] PTG Summary

2018-03-12 Thread Tim Bell
My worry with re-running the burn-in every time we do cleaning is for resource 
utilisation. When the machines are running the burn-in, they're not doing 
useful physics so I would want to minimise the number of times this is run over 
the life time of a machine.

It may be possible to do something like the burn in with a dedicated set of 
steps but still use the cleaning state machine.  

Having a cleaning step set (i.e. burn-in means 
cpuburn,memtest,badblocks,benchmark) would make it more friendly for the 
administrator. Similarly, retirement could be done with additional steps such 
as reset2factory.

Tim

-Original Message-
From: Dmitry Tantsur <dtant...@redhat.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Monday, 12 March 2018 at 12:47
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [ironic] PTG Summary

Hi Tim,

Thanks for the information.

I personally don't see problems with cleaning running weeks, when needed. 
What 
I'd avoid is replicating the same cleaning machinery but with a different 
name. 
I think we should try to make cleaning work for this case instead.

Dmitry

On 03/12/2018 12:33 PM, Tim Bell wrote:
> Julia,
> 
> A basic summary of CERN does burn-in is at 
http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html
> 
> Given that the burn in takes weeks to run, we'd see it as a different 
step to cleaning (with some parts in common such as firmware upgrades to latest 
levels)
> 
> Tim
> 
> -Original Message-
> From: Julia Kreger <juliaashleykre...@gmail.com>
> Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
> Date: Thursday, 8 March 2018 at 22:10
> To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
> Subject: [openstack-dev] [ironic] PTG Summary
> 
> ...
>  Cleaning - Burn-in
>  
>  As part of discussing cleaning changes, we discussed supporting a
>  "burn-in" mode where hardware could be left to run load, memory, or
>  other tests for a period of time. We did not have consensus on a
>  generic solution, other than that this should likely involve
>  clean-steps that we already have, and maybe another entry point into
>  cleaning. Since we didn't really have consensus on use cases, we
>  decided the logical thing was to write them down, and then go from
>  there.
>  
>  Action Items:
>  * Community members to document varying burn-in use cases for
>  hardware, as they may vary based upon industry.
>  * Community to try and come up with a couple example clean-steps.
>  
>  
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] PTG Summary

2018-03-12 Thread Dmitry Tantsur

Hi Tim,

Thanks for the information.

I personally don't see problems with cleaning running weeks, when needed. What 
I'd avoid is replicating the same cleaning machinery but with a different name. 
I think we should try to make cleaning work for this case instead.


Dmitry

On 03/12/2018 12:33 PM, Tim Bell wrote:

Julia,

A basic summary of CERN does burn-in is at 
http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html

Given that the burn in takes weeks to run, we'd see it as a different step to 
cleaning (with some parts in common such as firmware upgrades to latest levels)

Tim

-Original Message-
From: Julia Kreger <juliaashleykre...@gmail.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Thursday, 8 March 2018 at 22:10
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [ironic] PTG Summary

...
 Cleaning - Burn-in
 
 As part of discussing cleaning changes, we discussed supporting a

 "burn-in" mode where hardware could be left to run load, memory, or
 other tests for a period of time. We did not have consensus on a
 generic solution, other than that this should likely involve
 clean-steps that we already have, and maybe another entry point into
 cleaning. Since we didn't really have consensus on use cases, we
 decided the logical thing was to write them down, and then go from
 there.
 
 Action Items:

 * Community members to document varying burn-in use cases for
 hardware, as they may vary based upon industry.
 * Community to try and come up with a couple example clean-steps.
 
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] PTG Summary

2018-03-12 Thread Tim Bell
Julia,

A basic summary of CERN does burn-in is at 
http://openstack-in-production.blogspot.ch/2018/03/hardware-burn-in-in-cern-datacenter.html

Given that the burn in takes weeks to run, we'd see it as a different step to 
cleaning (with some parts in common such as firmware upgrades to latest levels)

Tim

-Original Message-
From: Julia Kreger <juliaashleykre...@gmail.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Thursday, 8 March 2018 at 22:10
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [ironic] PTG Summary

...
Cleaning - Burn-in

As part of discussing cleaning changes, we discussed supporting a
"burn-in" mode where hardware could be left to run load, memory, or
other tests for a period of time. We did not have consensus on a
generic solution, other than that this should likely involve
clean-steps that we already have, and maybe another entry point into
cleaning. Since we didn't really have consensus on use cases, we
decided the logical thing was to write them down, and then go from
there.

Action Items:
* Community members to document varying burn-in use cases for
hardware, as they may vary based upon industry.
* Community to try and come up with a couple example clean-steps.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ironic] PTG Summary

2018-03-08 Thread Julia Kreger
The Ironic PTG Summary - The blur(b) from the East

In an effort to provide visibility and awareness of all the things
related to Ironic, I've typed up a summary below. I've tried to keep
this fairly generalized with enough context and convey action items or
the instances of consensus where applicable. It goes without saying
that the week went by as a complete blur. We had to abruptly change
our schedule around, some fine detailed topics were missed. A special
thanks to Ruby Loo for taking some time to proof read this for me.

-Julia

-

From our retrospective:

As seems to be the norm with retrospectives, we did bring up a number
of issues that slowed us down, hindered us, or hindered the ability to
move faster. A great deal of this revolved around specifications, and
the perceptions that tend to occur.

Action Items:

* Jroll will bring up for discussion if we can update the theme for
rendered specs documentation to highlight that the specs are points in
time references for design, and are not final documentation.
* TheJulia will revise our specification template to attempt to be
more clear about *why* we are asking the questions, also to suggest
but not require proof of concept code

After our retrospective, we spoke about things that can improve our
velocity. This sort of discussion tends to always come up, and focused
on community cultural aspects of revising/helping land code. The
conclusion we quickly came to was that communication or context of the
contributor is required. One of the points raised, that we did not get
to, was that we should listen to contributor's perceptions, which
really goes back to communication.

As time went on, we shifted gears to a high level status of ironic,
and there are some items to take away:

* Inspector, at a high level, could use some additional work and
contributors. Virtual media boot support would be helpful, and we may
look at breaking some portions out and moving them into ironic.
Additional High Availability work may be needed, at the same time it
may not be needed. Entirely to be determined.

* Ironic-ui presently has no active contributors, but is stable. Major
risk right now is a breaking change coming from Horizon, which was
also discussed earlier in the week with Horizon. Will add testing such
that horizon's gate triggers ironic-ui testing and raises visibility
to breaking changes.

* Ironic itself got a lot completed this cycle, and we should expect
quite a bit this cycle in terms of clean-up from deprecation.

* Networking-baremetal received a good portion of work this cycle due
to routed networks support. \o/

* Networking-generic-switch seems to be in a fairly stable state at
this point. Some trunk awareness has been added, as well as some new
switches and bug fixes.

* Bifrost has low activity, but at the same time we're seeing new
contributors fix issues or improve things, which is a good sign.

* Sushy got authentication and introspection support added this cycle.
We discussed that we may want to consider supporting RAID (in terms of
client actions), as well as composable hardware.

After statuses, we shifted into discussing the future.

We started the entire discussion of the future with a visioning
exercise to help frame the future, so we were all using the same words
and had the same scope in mind when discussing the future of Ironic.
One thing worth noting is upfront there was a lot of alignment, but we
sometimes were just using slightly different words or concepts. Taking
a little more time to reconcile those differences allowed us to relate
additional words to the same meaning. Truly this set the stage for all
of the other topics, and gave us the common reference point to grasp
if what we were talking about made sense. Expect Jroll to send out an
email to the mailing list to summarize this further, and from this
initial discussion we will likely draft a formal vision document that
will allow us to continue having the same reference point for
discussions. Maybe one day your light bulb will be provisioned with
Ironic!

Deploy Steps

In terms of the future, we again returned to the concept of breaking
up deployments into a series of steps. Without going deep into detail,
this is a very large piece of functionality and would help solve many
problems and desires that exist today, especially where some operators
wish things like deploy-time raid, or to flash firmware as part of the
baremetal node provisioning process. This work is also influenced by
traits, because traits can map to actions that need to be performed
automatically. In the end, we agreed to take a small step, and iterate
from there. Specifically adding a deploy steps framework and splitting
our current deploy process into two logical steps.

Location Awareness

"Location awareness" as we are calling it, or possibly better stated
as "conductor to node affinity" is a topic that we again revisited.
This is important as many operators desire a single pane of glass for
their