Re: [openstack-dev] [tripleo][ironic] introspection and CI
With large sets of nodes to introspect we typically avoid using the bulk introspection. I have written a quick script that introspects a couple nodes at a time: https://gist.github.com/jtaleric/fcca3811cd4d8f37336f9532e5b9c9ff Maybe we can add this sort of logic to bulk introspection, with some retries? On Tue, Oct 18, 2016 at 8:29 AM, John Trowbridgewrote: > > > On 10/18/2016 07:20 AM, Wesley Hayutin wrote: >> See my response inline. >> >> On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsur wrote: >> >>> On 10/17/2016 11:10 PM, Wesley Hayutin wrote: >>> Greetings, The RDO CI team is considering adding retries to our calls to introspection again [1]. This is very handy for bare metal environments where retries may be needed due to random chaos in the environment itself. We're trying to balance two things here.. 1. reduce the number of false negatives in CI 2. try not to overstep what CI should vs. what the product should do. We would like to hear your comments if you think this is acceptable for CI or if this may be overstepping. Thank you [1] http://paste.openstack.org/show/586035/ >>> >>> Hi! >>> >>> I probably lack some context of what exactly problems you face. I don't >>> have any disagreement with retrying it, just want to make sure we're not >>> missing actual bugs. >>> >> >> I agree, we have to be careful not to paper over bugs while we try to >> overcome typical environmental delays that come w/ booting, rebooting $x >> number of random hardware nodes. >> To make this a little more crystal clear, I'm trying to determine is where >> progressive delays and retries should be injected into the workflow of >> deploying an overcloud. >> Should we add options in the product itself that allow for $x number of >> retries w/ a configurable set of delays for introspection? [2] Is the >> expectation this works the first time everytime? >> Are we overstepping what CI should do by implementing [1]. > > IMO, yes, we are overstepping what CI should be doing with [1]. Mostly > because we are providing a better UX in CI than an actual user will get. >> >> Additionally would it be appropriate to implement [1], while [2] is >> developed for the next release and is it OK to use [1] with older releases? >> > > However, I think it is ok to implement [1] in CI, if the following are true: > > 1) There is an in progress bug to make this UX better for non-CI user. > 2) For older releases if said bug is deemed inappropriate for backport. > >> Thanks for your time and responses. >> >> >> [1] http://paste.openstack.org/show/586035/ >> [2] >> https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169 >> > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ironic] introspection and CI
On 10/18/2016 07:20 AM, Wesley Hayutin wrote: > See my response inline. > > On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsurwrote: > >> On 10/17/2016 11:10 PM, Wesley Hayutin wrote: >> >>> Greetings, >>> >>> The RDO CI team is considering adding retries to our calls to >>> introspection >>> again [1]. >>> This is very handy for bare metal environments where retries may be >>> needed due >>> to random chaos in the environment itself. >>> >>> We're trying to balance two things here.. >>> 1. reduce the number of false negatives in CI >>> 2. try not to overstep what CI should vs. what the product should do. >>> >>> We would like to hear your comments if you think this is acceptable for >>> CI or if >>> this may be overstepping. >>> >>> Thank you >>> >>> >>> [1] http://paste.openstack.org/show/586035/ >>> >> >> Hi! >> >> I probably lack some context of what exactly problems you face. I don't >> have any disagreement with retrying it, just want to make sure we're not >> missing actual bugs. >> > > I agree, we have to be careful not to paper over bugs while we try to > overcome typical environmental delays that come w/ booting, rebooting $x > number of random hardware nodes. > To make this a little more crystal clear, I'm trying to determine is where > progressive delays and retries should be injected into the workflow of > deploying an overcloud. > Should we add options in the product itself that allow for $x number of > retries w/ a configurable set of delays for introspection? [2] Is the > expectation this works the first time everytime? > Are we overstepping what CI should do by implementing [1]. IMO, yes, we are overstepping what CI should be doing with [1]. Mostly because we are providing a better UX in CI than an actual user will get. > > Additionally would it be appropriate to implement [1], while [2] is > developed for the next release and is it OK to use [1] with older releases? > However, I think it is ok to implement [1] in CI, if the following are true: 1) There is an in progress bug to make this UX better for non-CI user. 2) For older releases if said bug is deemed inappropriate for backport. > Thanks for your time and responses. > > > [1] http://paste.openstack.org/show/586035/ > [2] > https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169 > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ironic] introspection and CI
See my response inline. On Tue, Oct 18, 2016 at 6:07 AM, Dmitry Tantsurwrote: > On 10/17/2016 11:10 PM, Wesley Hayutin wrote: > >> Greetings, >> >> The RDO CI team is considering adding retries to our calls to >> introspection >> again [1]. >> This is very handy for bare metal environments where retries may be >> needed due >> to random chaos in the environment itself. >> >> We're trying to balance two things here.. >> 1. reduce the number of false negatives in CI >> 2. try not to overstep what CI should vs. what the product should do. >> >> We would like to hear your comments if you think this is acceptable for >> CI or if >> this may be overstepping. >> >> Thank you >> >> >> [1] http://paste.openstack.org/show/586035/ >> > > Hi! > > I probably lack some context of what exactly problems you face. I don't > have any disagreement with retrying it, just want to make sure we're not > missing actual bugs. > I agree, we have to be careful not to paper over bugs while we try to overcome typical environmental delays that come w/ booting, rebooting $x number of random hardware nodes. To make this a little more crystal clear, I'm trying to determine is where progressive delays and retries should be injected into the workflow of deploying an overcloud. Should we add options in the product itself that allow for $x number of retries w/ a configurable set of delays for introspection? [2] Is the expectation this works the first time everytime? Are we overstepping what CI should do by implementing [1]. Additionally would it be appropriate to implement [1], while [2] is developed for the next release and is it OK to use [1] with older releases? Thanks for your time and responses. [1] http://paste.openstack.org/show/586035/ [2] https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L169 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ironic] introspection and CI
On 10/17/2016 11:10 PM, Wesley Hayutin wrote: Greetings, The RDO CI team is considering adding retries to our calls to introspection again [1]. This is very handy for bare metal environments where retries may be needed due to random chaos in the environment itself. We're trying to balance two things here.. 1. reduce the number of false negatives in CI 2. try not to overstep what CI should vs. what the product should do. We would like to hear your comments if you think this is acceptable for CI or if this may be overstepping. Thank you [1] http://paste.openstack.org/show/586035/ Hi! I probably lack some context of what exactly problems you face. I don't have any disagreement with retrying it, just want to make sure we're not missing actual bugs. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ironic] introspection and CI
On 17 October 2016 at 22:10, Wesley Hayutinwrote: > Greetings, > > The RDO CI team is considering adding retries to our calls to > introspection again [1]. > This is very handy for bare metal environments where retries may be needed > due to random chaos in the environment itself. > > We're trying to balance two things here.. > 1. reduce the number of false negatives in CI > 2. try not to overstep what CI should vs. what the product should do. > > We would like to hear your comments if you think this is acceptable for CI > or if this may be overstepping. > I don't have a strong opinion about this, I'd be curious to hear what the Ironic folk think. However, if it is considered an okay idea, I wonder if this is something we should add as a feature in Ocata to the Introspection workflows in Mistral. It would presumably be more efficient to try failing nodes twice rather than re-running the full process. We could then have various retry limits to make it configurable for users. Thank you > > > [1] http://paste.openstack.org/show/586035/ > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo][ironic] introspection and CI
Greetings, The RDO CI team is considering adding retries to our calls to introspection again [1]. This is very handy for bare metal environments where retries may be needed due to random chaos in the environment itself. We're trying to balance two things here.. 1. reduce the number of false negatives in CI 2. try not to overstep what CI should vs. what the product should do. We would like to hear your comments if you think this is acceptable for CI or if this may be overstepping. Thank you [1] http://paste.openstack.org/show/586035/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev