Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread Joe Talerico
With large sets of nodes to introspect we typically avoid using the
bulk introspection. I have written a quick script that introspects a
couple nodes at a time:
https://gist.github.com/jtaleric/fcca3811cd4d8f37336f9532e5b9c9ff

Maybe we can add this sort of logic to bulk introspection, with some retries?

On Tue, Oct 18, 2016 at 8:29 AM, John Trowbridge  wrote:
>
>
> On 10/18/2016 07:20 AM, Wesley Hayutin wrote:
>> See my response inline.
>>
>> On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur  wrote:
>>
>>> On 10/17/2016 11:10 PM, Wesley Hayutin wrote:
>>>
 Greetings,

 The RDO CI team is considering adding retries to our calls to
 introspection
 again [1].
 This is very handy for bare metal environments where retries may be
 needed due
 to random chaos in the environment itself.

 We're trying to balance two things here..
 1. reduce the number of false negatives in CI
 2. try not to overstep what CI should vs. what the product should do.

 We would like to hear your comments if you think this is acceptable for
 CI or if
 this may be overstepping.

 Thank you


 [1] http://paste.openstack.org/show/586035/

>>>
>>> Hi!
>>>
>>> I probably lack some context of what exactly problems you face. I don't
>>> have any disagreement with retrying it, just want to make sure we're not
>>> missing actual bugs.
>>>
>>
>> I agree, we have to be careful not to paper over bugs while we try to
>> overcome typical environmental delays that come w/ booting, rebooting $x
>> number of random hardware nodes.
>> To make this a little more crystal clear, I'm trying to determine is where
>> progressive delays and retries should be injected into the workflow of
>> deploying an overcloud.
>> Should we add options in the product itself that allow for $x number of
>> retries w/ a configurable set of delays for introspection? [2]  Is the
>> expectation this works the first time everytime?
>> Are we overstepping what CI should do by implementing [1].
>
> IMO, yes, we are overstepping what CI should be doing with [1]. Mostly
> because we are providing a better UX in CI than an actual user will get.
>>
>> Additionally would it be appropriate to implement [1], while [2] is
>> developed for the next release and is it OK to use [1] with older releases?
>>
>
> However, I think it is ok to implement [1] in CI, if the following are true:
>
> 1) There is an in progress bug to make this UX better for non-CI user.
> 2) For older releases if said bug is deemed inappropriate for backport.
>
>> Thanks for your time and responses.
>>
>>
>> [1] http://paste.openstack.org/show/586035/
>> [2]
>> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169
>>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread John Trowbridge


On 10/18/2016 07:20 AM, Wesley Hayutin wrote:
> See my response inline.
> 
> On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur  wrote:
> 
>> On 10/17/2016 11:10 PM, Wesley Hayutin wrote:
>>
>>> Greetings,
>>>
>>> The RDO CI team is considering adding retries to our calls to
>>> introspection
>>> again [1].
>>> This is very handy for bare metal environments where retries may be
>>> needed due
>>> to random chaos in the environment itself.
>>>
>>> We're trying to balance two things here..
>>> 1. reduce the number of false negatives in CI
>>> 2. try not to overstep what CI should vs. what the product should do.
>>>
>>> We would like to hear your comments if you think this is acceptable for
>>> CI or if
>>> this may be overstepping.
>>>
>>> Thank you
>>>
>>>
>>> [1] http://paste.openstack.org/show/586035/
>>>
>>
>> Hi!
>>
>> I probably lack some context of what exactly problems you face. I don't
>> have any disagreement with retrying it, just want to make sure we're not
>> missing actual bugs.
>>
> 
> I agree, we have to be careful not to paper over bugs while we try to
> overcome typical environmental delays that come w/ booting, rebooting $x
> number of random hardware nodes.
> To make this a little more crystal clear, I'm trying to determine is where
> progressive delays and retries should be injected into the workflow of
> deploying an overcloud.
> Should we add options in the product itself that allow for $x number of
> retries w/ a configurable set of delays for introspection? [2]  Is the
> expectation this works the first time everytime?
> Are we overstepping what CI should do by implementing [1].

IMO, yes, we are overstepping what CI should be doing with [1]. Mostly
because we are providing a better UX in CI than an actual user will get.
> 
> Additionally would it be appropriate to implement [1], while [2] is
> developed for the next release and is it OK to use [1] with older releases?
> 

However, I think it is ok to implement [1] in CI, if the following are true:

1) There is an in progress bug to make this UX better for non-CI user.
2) For older releases if said bug is deemed inappropriate for backport.

> Thanks for your time and responses.
> 
> 
> [1] http://paste.openstack.org/show/586035/
> [2]
> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread Wesley Hayutin
See my response inline.

On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur  wrote:

> On 10/17/2016 11:10 PM, Wesley Hayutin wrote:
>
>> Greetings,
>>
>> The RDO CI team is considering adding retries to our calls to
>> introspection
>> again [1].
>> This is very handy for bare metal environments where retries may be
>> needed due
>> to random chaos in the environment itself.
>>
>> We're trying to balance two things here..
>> 1. reduce the number of false negatives in CI
>> 2. try not to overstep what CI should vs. what the product should do.
>>
>> We would like to hear your comments if you think this is acceptable for
>> CI or if
>> this may be overstepping.
>>
>> Thank you
>>
>>
>> [1] http://paste.openstack.org/show/586035/
>>
>
> Hi!
>
> I probably lack some context of what exactly problems you face. I don't
> have any disagreement with retrying it, just want to make sure we're not
> missing actual bugs.
>

I agree, we have to be careful not to paper over bugs while we try to
overcome typical environmental delays that come w/ booting, rebooting $x
number of random hardware nodes.
To make this a little more crystal clear, I'm trying to determine is where
progressive delays and retries should be injected into the workflow of
deploying an overcloud.
Should we add options in the product itself that allow for $x number of
retries w/ a configurable set of delays for introspection? [2]  Is the
expectation this works the first time everytime?
Are we overstepping what CI should do by implementing [1].

Additionally would it be appropriate to implement [1], while [2] is
developed for the next release and is it OK to use [1] with older releases?

Thanks for your time and responses.


[1] http://paste.openstack.org/show/586035/
[2]
https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread Dmitry Tantsur

On 10/17/2016 11:10 PM, Wesley Hayutin wrote:

Greetings,

The RDO CI team is considering adding retries to our calls to introspection
again [1].
This is very handy for bare metal environments where retries may be needed due
to random chaos in the environment itself.

We're trying to balance two things here..
1. reduce the number of false negatives in CI
2. try not to overstep what CI should vs. what the product should do.

We would like to hear your comments if you think this is acceptable for CI or if
this may be overstepping.

Thank you


[1] http://paste.openstack.org/show/586035/


Hi!

I probably lack some context of what exactly problems you face. I don't have any 
disagreement with retrying it, just want to make sure we're not missing actual bugs.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][ironic] introspection and CI

2016-10-18 Thread Dougal Matthews
On 17 October 2016 at 22:10, Wesley Hayutin  wrote:

> Greetings,
>
> The RDO CI team is considering adding retries to our calls to
> introspection again [1].
> This is very handy for bare metal environments where retries may be needed
> due to random chaos in the environment itself.
>
> We're trying to balance two things here..
> 1. reduce the number of false negatives in CI
> 2. try not to overstep what CI should vs. what the product should do.
>
> We would like to hear your comments if you think this is acceptable for CI
> or if this may be overstepping.
>

I don't have a strong opinion about this, I'd be curious to hear what the
Ironic folk think.

However, if it is considered an okay idea, I wonder if this is something we
should add as a feature in Ocata to the Introspection workflows in Mistral.
It would presumably be more efficient to try failing nodes twice rather
than re-running the full process. We could then have various retry limits
to make it configurable for users.


Thank you
>
>
> [1] http://paste.openstack.org/show/586035/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][ironic] introspection and CI

2016-10-17 Thread Wesley Hayutin
Greetings,

The RDO CI team is considering adding retries to our calls to introspection
again [1].
This is very handy for bare metal environments where retries may be needed
due to random chaos in the environment itself.

We're trying to balance two things here..
1. reduce the number of false negatives in CI
2. try not to overstep what CI should vs. what the product should do.

We would like to hear your comments if you think this is acceptable for CI
or if this may be overstepping.

Thank you


[1] http://paste.openstack.org/show/586035/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev