With large sets of nodes to introspect we typically avoid using the
bulk introspection. I have written a quick script that introspects a
couple nodes at a time:
https://gist.github.com/jtaleric/fcca3811cd4d8f37336f9532e5b9c9ff

Maybe we can add this sort of logic to bulk introspection, with some retries?

On Tue, Oct 18, 2016 at 8:29 AM, John Trowbridge <tr...@redhat.com> wrote:
>
>
> On 10/18/2016 07:20 AM, Wesley Hayutin wrote:
>> See my response inline.
>>
>> On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur <dtant...@redhat.com> wrote:
>>
>>> On 10/17/2016 11:10 PM, Wesley Hayutin wrote:
>>>
>>>> Greetings,
>>>>
>>>> The RDO CI team is considering adding retries to our calls to
>>>> introspection
>>>> again [1].
>>>> This is very handy for bare metal environments where retries may be
>>>> needed due
>>>> to random chaos in the environment itself.
>>>>
>>>> We're trying to balance two things here..
>>>> 1. reduce the number of false negatives in CI
>>>> 2. try not to overstep what CI should vs. what the product should do.
>>>>
>>>> We would like to hear your comments if you think this is acceptable for
>>>> CI or if
>>>> this may be overstepping.
>>>>
>>>> Thank you
>>>>
>>>>
>>>> [1] http://paste.openstack.org/show/586035/
>>>>
>>>
>>> Hi!
>>>
>>> I probably lack some context of what exactly problems you face. I don't
>>> have any disagreement with retrying it, just want to make sure we're not
>>> missing actual bugs.
>>>
>>
>> I agree, we have to be careful not to paper over bugs while we try to
>> overcome typical environmental delays that come w/ booting, rebooting $x
>> number of random hardware nodes.
>> To make this a little more crystal clear, I'm trying to determine is where
>> progressive delays and retries should be injected into the workflow of
>> deploying an overcloud.
>> Should we add options in the product itself that allow for $x number of
>> retries w/ a configurable set of delays for introspection? [2]  Is the
>> expectation this works the first time everytime?
>> Are we overstepping what CI should do by implementing [1].
>
> IMO, yes, we are overstepping what CI should be doing with [1]. Mostly
> because we are providing a better UX in CI than an actual user will get.
>>
>> Additionally would it be appropriate to implement [1], while [2] is
>> developed for the next release and is it OK to use [1] with older releases?
>>
>
> However, I think it is ok to implement [1] in CI, if the following are true:
>
>     1) There is an in progress bug to make this UX better for non-CI user.
>     2) For older releases if said bug is deemed inappropriate for backport.
>
>> Thanks for your time and responses.
>>
>>
>> [1] http://paste.openstack.org/show/586035/
>> [2]
>> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to