Re: [openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
On Thu, 2016-03-10 at 23:24 +, Jeremy Stanley wrote: > On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: > > > > This seems to be the week people want to pile it on TripleO. > > Talking > > about upstream is great but I suppose I'd rather debate major > > changes > > after we branch Mitaka. :/ > [...] > > I didn't mean to pile on TripleO, nor did I intend to imply this was > something which should happen ASAP (or even necessarily at all), but > I do want to better understand what actual benefit is currently > derived from this implementation vs. a more typical third-party CI > (which lots of projects are doing when they find their testing needs > are not met by the constraints of our generic test infrastructure). > > > > > With regards to Jenkins restarts I think it is understood that our > > job > > times are long. How often do you find infra needs to restart > > Jenkins? > We're restarting all 8 of our production Jenkins masters weekly at a > minimum, but generally more often when things are busy (2-3 times a > week). For many months we've been struggling with a thread leak for > which their development team has not seen as a priority to even > triage our bug report effectively. At this point I think we've > mostly given up on expecting it to be solved by anything other than > our upcoming migration off of Jenkins, but that's another topic > altogether. > > > > > And regardless of that what if we just said we didn't mind the > > destructiveness of losing a few jobs now and then (until our job > > times are under the line... say 1.5 hours or so). To be clear I'd > > be fine with infra pulling the rug on running jobs if this is the > > root cause of the long running jobs in TripleO. > For manual Jenkins restarts this is probably doable (if additional > hassle), but I don't know whether that's something we can easily > shoehorn into our orchestrated/automated restarts. > > > > > I think the "benefits are minimal" is bit of an overstatement. The > > initial vision for TripleO CI stands and I would still like to see > > individual projects entertain the option to use us in their gates. > [...] > > This is what I'd like to delve deeper into. The current > implementation isn't providing you with any mechanism to prevent > changes which fail jobs running in the tripleo-test cloud from > merging to your repos, is it? You're still having to manually > inspect the job results posted by it? How is that particularly > different from relying on third-party CI integration? Perhaps we don't have a lot of differences today but I don't think that is where we want to be. Moving TripleO CI into 3rd party CI is IMO strategically a bad move for the project that aims to provide a feedback loop for breakages into other upstream OpenStack projects. I would argue that we are in a unique position to do that in TripleO... and becoming 3rd party CI is a retreat from providing this feedback loop which can benefit other projects we rely on heavily (think Heat, Mistral, Ironic, etc.). We want to gate our stuff. We need to gate our own stuff. That said we've overstepped our resource boundaries. Our job runtimes are way long. We have several efforts in progress to help improve that. 1) Caching. Dereks' work on caching should significantly help us improve our job wall times: https://review.openstack.org/#/q/topic:mirror-server 2) metrics tracking. I've posted a patch to help us better track various wall times and image size's in tripleo-ci: https://review.openstack.org/#/c/291393/ 3) the ability to test components of TripleO outside of baremetal environments. Steve Hardy has been working on some approaches to testing tripleo-heat-templates on normal OpenStack cloud instances. Using this approach would allow us to test a significant portion of our patches on groups of nodepool instances. Need to prototype this a bit further but I think this holds some promising for allowing us to split up our testing scenarios, etc. So rather than ask why can't TripleO become 3rd party CI I'd ask what harm are we causing where we are at? I like where we are at because the management is well know to the team and other OpenStack projects. And does working on the items above (speeding up our wall time, keeping better metrics tracking, using more public cloud resource) help make everyone happier? Dan > > As for other projects making use of the same jobs, right now the > only convenience I'm aware of is that they can add check-tripleo > pipeline jobs in our Zuul layout file instead of having you add it > to yours (which could itself reside in a Git repo under your > control, giving you even more flexibility over those choices). In > fact, with a third-party CI using its own separate Gerrit account, > you would be able to leave clear -1/+1 votes on check results which > is not possible with the present solution. > > So anyway, I'm not saying that I definitely believe the third-party > CI route will be better for TripleO, but I'm
Re: [openstack-dev] [tripleo] becoming third party CI
On Mon, Mar 21, 2016, at 05:33 AM, Derek Higgins wrote: > Doing this in 3rd party ci I think simplyfies things because we'll no > longer need a public cloud and as a result the security measures > requeired to avoid putting cloud credentials on the jenkins slaves > wont be needed. Just a word of warning that if you run arbitrary code posted to Gerrit and post log results that are publicly available then you probably want to avoid putting any credentials or other secret data on the Jenkins slaves as that can be trivially exposed (assuming the jobs allow root to do tripleo things like build dib images). Clark __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] becoming third party CI
On 03/21/2016 03:52 PM, Paul Belanger wrote: > On Mon, Mar 21, 2016 at 10:57:53AM -0500, Ben Nemec wrote: >> On 03/21/2016 07:33 AM, Derek Higgins wrote: >>> On 17 March 2016 at 16:59, Ben Nemecwrote: On 03/10/2016 05:24 PM, Jeremy Stanley wrote: > On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: >> This seems to be the week people want to pile it on TripleO. Talking >> about upstream is great but I suppose I'd rather debate major changes >> after we branch Mitaka. :/ > [...] > > I didn't mean to pile on TripleO, nor did I intend to imply this was > something which should happen ASAP (or even necessarily at all), but > I do want to better understand what actual benefit is currently > derived from this implementation vs. a more typical third-party CI > (which lots of projects are doing when they find their testing needs > are not met by the constraints of our generic test infrastructure). > >> With regards to Jenkins restarts I think it is understood that our job >> times are long. How often do you find infra needs to restart Jenkins? > > We're restarting all 8 of our production Jenkins masters weekly at a > minimum, but generally more often when things are busy (2-3 times a > week). For many months we've been struggling with a thread leak for > which their development team has not seen as a priority to even > triage our bug report effectively. At this point I think we've > mostly given up on expecting it to be solved by anything other than > our upcoming migration off of Jenkins, but that's another topic > altogether. > >> And regardless of that what if we just said we didn't mind the >> destructiveness of losing a few jobs now and then (until our job >> times are under the line... say 1.5 hours or so). To be clear I'd >> be fine with infra pulling the rug on running jobs if this is the >> root cause of the long running jobs in TripleO. > > For manual Jenkins restarts this is probably doable (if additional > hassle), but I don't know whether that's something we can easily > shoehorn into our orchestrated/automated restarts. > >> I think the "benefits are minimal" is bit of an overstatement. The >> initial vision for TripleO CI stands and I would still like to see >> individual projects entertain the option to use us in their gates. > [...] > > This is what I'd like to delve deeper into. The current > implementation isn't providing you with any mechanism to prevent > changes which fail jobs running in the tripleo-test cloud from > merging to your repos, is it? You're still having to manually > inspect the job results posted by it? How is that particularly > different from relying on third-party CI integration? > > As for other projects making use of the same jobs, right now the > only convenience I'm aware of is that they can add check-tripleo > pipeline jobs in our Zuul layout file instead of having you add it > to yours (which could itself reside in a Git repo under your > control, giving you even more flexibility over those choices). In > fact, with a third-party CI using its own separate Gerrit account, > you would be able to leave clear -1/+1 votes on check results which > is not possible with the present solution. > > So anyway, I'm not saying that I definitely believe the third-party > CI route will be better for TripleO, but I'm not (yet) clear on what > tangible benefit you're receiving now that you lose by switching to > that model. > FWIW, I think third-party CI probably makes sense for TripleO. Practically speaking we are third-party CI right now - we run our own independent hardware infrastructure, we aren't multi-region, and we can't leave a vote on changes. Since the first two aren't likely to change any time soon (although I believe it's still a long-term goal to get to a place where we can run in regular infra and just contribute our existing CI hardware to the general infra pool, but that's still a long way off), and moving to actual third-party CI would get us the ability to vote, I think it's worth pursuing. As an added bit of fun, we have a forced move of our CI hardware coming up in the relatively near future, and if we don't want to have multiple days (and possibly more, depending on how the move goes) of TripleO CI outage we're probably going to need to stand up a new environment in parallel anyway. If we're doing that it might make sense to try hooking it in through the third-party infra instead of the way we do it today. Hopefully that would allow us to work out the kinks before the old environment goes away. Anyway, I'm sure we'll need a bunch more discussion about this, but I wanted to chime
Re: [openstack-dev] [tripleo] becoming third party CI
On Mon, Mar 21, 2016 at 10:57:53AM -0500, Ben Nemec wrote: > On 03/21/2016 07:33 AM, Derek Higgins wrote: > > On 17 March 2016 at 16:59, Ben Nemecwrote: > >> On 03/10/2016 05:24 PM, Jeremy Stanley wrote: > >>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: > This seems to be the week people want to pile it on TripleO. Talking > about upstream is great but I suppose I'd rather debate major changes > after we branch Mitaka. :/ > >>> [...] > >>> > >>> I didn't mean to pile on TripleO, nor did I intend to imply this was > >>> something which should happen ASAP (or even necessarily at all), but > >>> I do want to better understand what actual benefit is currently > >>> derived from this implementation vs. a more typical third-party CI > >>> (which lots of projects are doing when they find their testing needs > >>> are not met by the constraints of our generic test infrastructure). > >>> > With regards to Jenkins restarts I think it is understood that our job > times are long. How often do you find infra needs to restart Jenkins? > >>> > >>> We're restarting all 8 of our production Jenkins masters weekly at a > >>> minimum, but generally more often when things are busy (2-3 times a > >>> week). For many months we've been struggling with a thread leak for > >>> which their development team has not seen as a priority to even > >>> triage our bug report effectively. At this point I think we've > >>> mostly given up on expecting it to be solved by anything other than > >>> our upcoming migration off of Jenkins, but that's another topic > >>> altogether. > >>> > And regardless of that what if we just said we didn't mind the > destructiveness of losing a few jobs now and then (until our job > times are under the line... say 1.5 hours or so). To be clear I'd > be fine with infra pulling the rug on running jobs if this is the > root cause of the long running jobs in TripleO. > >>> > >>> For manual Jenkins restarts this is probably doable (if additional > >>> hassle), but I don't know whether that's something we can easily > >>> shoehorn into our orchestrated/automated restarts. > >>> > I think the "benefits are minimal" is bit of an overstatement. The > initial vision for TripleO CI stands and I would still like to see > individual projects entertain the option to use us in their gates. > >>> [...] > >>> > >>> This is what I'd like to delve deeper into. The current > >>> implementation isn't providing you with any mechanism to prevent > >>> changes which fail jobs running in the tripleo-test cloud from > >>> merging to your repos, is it? You're still having to manually > >>> inspect the job results posted by it? How is that particularly > >>> different from relying on third-party CI integration? > >>> > >>> As for other projects making use of the same jobs, right now the > >>> only convenience I'm aware of is that they can add check-tripleo > >>> pipeline jobs in our Zuul layout file instead of having you add it > >>> to yours (which could itself reside in a Git repo under your > >>> control, giving you even more flexibility over those choices). In > >>> fact, with a third-party CI using its own separate Gerrit account, > >>> you would be able to leave clear -1/+1 votes on check results which > >>> is not possible with the present solution. > >>> > >>> So anyway, I'm not saying that I definitely believe the third-party > >>> CI route will be better for TripleO, but I'm not (yet) clear on what > >>> tangible benefit you're receiving now that you lose by switching to > >>> that model. > >>> > >> > >> FWIW, I think third-party CI probably makes sense for TripleO. > >> Practically speaking we are third-party CI right now - we run our own > >> independent hardware infrastructure, we aren't multi-region, and we > >> can't leave a vote on changes. Since the first two aren't likely to > >> change any time soon (although I believe it's still a long-term goal to > >> get to a place where we can run in regular infra and just contribute our > >> existing CI hardware to the general infra pool, but that's still a long > >> way off), and moving to actual third-party CI would get us the ability > >> to vote, I think it's worth pursuing. > >> > >> As an added bit of fun, we have a forced move of our CI hardware coming > >> up in the relatively near future, and if we don't want to have multiple > >> days (and possibly more, depending on how the move goes) of TripleO CI > >> outage we're probably going to need to stand up a new environment in > >> parallel anyway. If we're doing that it might make sense to try hooking > >> it in through the third-party infra instead of the way we do it today. > >> Hopefully that would allow us to work out the kinks before the old > >> environment goes away. > >> > >> Anyway, I'm sure we'll need a bunch more discussion about this, but I > >> wanted to chime in with my two cents. > > > > We need to answer
Re: [openstack-dev] [tripleo] becoming third party CI
On 2016-03-21 10:57:53 -0500 (-0500), Ben Nemec wrote: [...] > So I'm not sure this is actually a step away from gating all the > projects. In fact, since we can't vote today as part of the integrated > gate, and I believe that would continue to be the case until we could > run entirely in regular infra instead of as a separate thing, I feel > like this is probably a requirement to be voting on other projects > anytime in the near future. [...] One other point which was touched on in the thread about using Tempest is that you're currently limited by job duration constraints in our upstream CI. If you controlled the full CI stack you could in theory support jobs which run as long as you need. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] becoming third party CI
On 03/21/2016 07:33 AM, Derek Higgins wrote: > On 17 March 2016 at 16:59, Ben Nemecwrote: >> On 03/10/2016 05:24 PM, Jeremy Stanley wrote: >>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: This seems to be the week people want to pile it on TripleO. Talking about upstream is great but I suppose I'd rather debate major changes after we branch Mitaka. :/ >>> [...] >>> >>> I didn't mean to pile on TripleO, nor did I intend to imply this was >>> something which should happen ASAP (or even necessarily at all), but >>> I do want to better understand what actual benefit is currently >>> derived from this implementation vs. a more typical third-party CI >>> (which lots of projects are doing when they find their testing needs >>> are not met by the constraints of our generic test infrastructure). >>> With regards to Jenkins restarts I think it is understood that our job times are long. How often do you find infra needs to restart Jenkins? >>> >>> We're restarting all 8 of our production Jenkins masters weekly at a >>> minimum, but generally more often when things are busy (2-3 times a >>> week). For many months we've been struggling with a thread leak for >>> which their development team has not seen as a priority to even >>> triage our bug report effectively. At this point I think we've >>> mostly given up on expecting it to be solved by anything other than >>> our upcoming migration off of Jenkins, but that's another topic >>> altogether. >>> And regardless of that what if we just said we didn't mind the destructiveness of losing a few jobs now and then (until our job times are under the line... say 1.5 hours or so). To be clear I'd be fine with infra pulling the rug on running jobs if this is the root cause of the long running jobs in TripleO. >>> >>> For manual Jenkins restarts this is probably doable (if additional >>> hassle), but I don't know whether that's something we can easily >>> shoehorn into our orchestrated/automated restarts. >>> I think the "benefits are minimal" is bit of an overstatement. The initial vision for TripleO CI stands and I would still like to see individual projects entertain the option to use us in their gates. >>> [...] >>> >>> This is what I'd like to delve deeper into. The current >>> implementation isn't providing you with any mechanism to prevent >>> changes which fail jobs running in the tripleo-test cloud from >>> merging to your repos, is it? You're still having to manually >>> inspect the job results posted by it? How is that particularly >>> different from relying on third-party CI integration? >>> >>> As for other projects making use of the same jobs, right now the >>> only convenience I'm aware of is that they can add check-tripleo >>> pipeline jobs in our Zuul layout file instead of having you add it >>> to yours (which could itself reside in a Git repo under your >>> control, giving you even more flexibility over those choices). In >>> fact, with a third-party CI using its own separate Gerrit account, >>> you would be able to leave clear -1/+1 votes on check results which >>> is not possible with the present solution. >>> >>> So anyway, I'm not saying that I definitely believe the third-party >>> CI route will be better for TripleO, but I'm not (yet) clear on what >>> tangible benefit you're receiving now that you lose by switching to >>> that model. >>> >> >> FWIW, I think third-party CI probably makes sense for TripleO. >> Practically speaking we are third-party CI right now - we run our own >> independent hardware infrastructure, we aren't multi-region, and we >> can't leave a vote on changes. Since the first two aren't likely to >> change any time soon (although I believe it's still a long-term goal to >> get to a place where we can run in regular infra and just contribute our >> existing CI hardware to the general infra pool, but that's still a long >> way off), and moving to actual third-party CI would get us the ability >> to vote, I think it's worth pursuing. >> >> As an added bit of fun, we have a forced move of our CI hardware coming >> up in the relatively near future, and if we don't want to have multiple >> days (and possibly more, depending on how the move goes) of TripleO CI >> outage we're probably going to need to stand up a new environment in >> parallel anyway. If we're doing that it might make sense to try hooking >> it in through the third-party infra instead of the way we do it today. >> Hopefully that would allow us to work out the kinks before the old >> environment goes away. >> >> Anyway, I'm sure we'll need a bunch more discussion about this, but I >> wanted to chime in with my two cents. > > We need to answer this question soon, I'm currently working on the CI > parts that we need in order of move to OVB[1] and was assuming we > would be maintaining the status quo. What we end up doing would look > very different if we move to 3rd party
Re: [openstack-dev] [tripleo] becoming third party CI
On 17 March 2016 at 16:59, Ben Nemecwrote: > On 03/10/2016 05:24 PM, Jeremy Stanley wrote: >> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: >>> This seems to be the week people want to pile it on TripleO. Talking >>> about upstream is great but I suppose I'd rather debate major changes >>> after we branch Mitaka. :/ >> [...] >> >> I didn't mean to pile on TripleO, nor did I intend to imply this was >> something which should happen ASAP (or even necessarily at all), but >> I do want to better understand what actual benefit is currently >> derived from this implementation vs. a more typical third-party CI >> (which lots of projects are doing when they find their testing needs >> are not met by the constraints of our generic test infrastructure). >> >>> With regards to Jenkins restarts I think it is understood that our job >>> times are long. How often do you find infra needs to restart Jenkins? >> >> We're restarting all 8 of our production Jenkins masters weekly at a >> minimum, but generally more often when things are busy (2-3 times a >> week). For many months we've been struggling with a thread leak for >> which their development team has not seen as a priority to even >> triage our bug report effectively. At this point I think we've >> mostly given up on expecting it to be solved by anything other than >> our upcoming migration off of Jenkins, but that's another topic >> altogether. >> >>> And regardless of that what if we just said we didn't mind the >>> destructiveness of losing a few jobs now and then (until our job >>> times are under the line... say 1.5 hours or so). To be clear I'd >>> be fine with infra pulling the rug on running jobs if this is the >>> root cause of the long running jobs in TripleO. >> >> For manual Jenkins restarts this is probably doable (if additional >> hassle), but I don't know whether that's something we can easily >> shoehorn into our orchestrated/automated restarts. >> >>> I think the "benefits are minimal" is bit of an overstatement. The >>> initial vision for TripleO CI stands and I would still like to see >>> individual projects entertain the option to use us in their gates. >> [...] >> >> This is what I'd like to delve deeper into. The current >> implementation isn't providing you with any mechanism to prevent >> changes which fail jobs running in the tripleo-test cloud from >> merging to your repos, is it? You're still having to manually >> inspect the job results posted by it? How is that particularly >> different from relying on third-party CI integration? >> >> As for other projects making use of the same jobs, right now the >> only convenience I'm aware of is that they can add check-tripleo >> pipeline jobs in our Zuul layout file instead of having you add it >> to yours (which could itself reside in a Git repo under your >> control, giving you even more flexibility over those choices). In >> fact, with a third-party CI using its own separate Gerrit account, >> you would be able to leave clear -1/+1 votes on check results which >> is not possible with the present solution. >> >> So anyway, I'm not saying that I definitely believe the third-party >> CI route will be better for TripleO, but I'm not (yet) clear on what >> tangible benefit you're receiving now that you lose by switching to >> that model. >> > > FWIW, I think third-party CI probably makes sense for TripleO. > Practically speaking we are third-party CI right now - we run our own > independent hardware infrastructure, we aren't multi-region, and we > can't leave a vote on changes. Since the first two aren't likely to > change any time soon (although I believe it's still a long-term goal to > get to a place where we can run in regular infra and just contribute our > existing CI hardware to the general infra pool, but that's still a long > way off), and moving to actual third-party CI would get us the ability > to vote, I think it's worth pursuing. > > As an added bit of fun, we have a forced move of our CI hardware coming > up in the relatively near future, and if we don't want to have multiple > days (and possibly more, depending on how the move goes) of TripleO CI > outage we're probably going to need to stand up a new environment in > parallel anyway. If we're doing that it might make sense to try hooking > it in through the third-party infra instead of the way we do it today. > Hopefully that would allow us to work out the kinks before the old > environment goes away. > > Anyway, I'm sure we'll need a bunch more discussion about this, but I > wanted to chime in with my two cents. We need to answer this question soon, I'm currently working on the CI parts that we need in order of move to OVB[1] and was assuming we would be maintaining the status quo. What we end up doing would look very different if we move to 3rd party CI, if using 3rd party CI we can simply start a vanilla centos instance at use it as an undercloud. It can then create its own baremetal
Re: [openstack-dev] [tripleo] becoming third party CI
On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote: > On 03/10/2016 05:24 PM, Jeremy Stanley wrote: > > On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: > >> This seems to be the week people want to pile it on TripleO. Talking > >> about upstream is great but I suppose I'd rather debate major changes > >> after we branch Mitaka. :/ > > [...] > > > > I didn't mean to pile on TripleO, nor did I intend to imply this was > > something which should happen ASAP (or even necessarily at all), but > > I do want to better understand what actual benefit is currently > > derived from this implementation vs. a more typical third-party CI > > (which lots of projects are doing when they find their testing needs > > are not met by the constraints of our generic test infrastructure). > > > >> With regards to Jenkins restarts I think it is understood that our job > >> times are long. How often do you find infra needs to restart Jenkins? > > > > We're restarting all 8 of our production Jenkins masters weekly at a > > minimum, but generally more often when things are busy (2-3 times a > > week). For many months we've been struggling with a thread leak for > > which their development team has not seen as a priority to even > > triage our bug report effectively. At this point I think we've > > mostly given up on expecting it to be solved by anything other than > > our upcoming migration off of Jenkins, but that's another topic > > altogether. > > > >> And regardless of that what if we just said we didn't mind the > >> destructiveness of losing a few jobs now and then (until our job > >> times are under the line... say 1.5 hours or so). To be clear I'd > >> be fine with infra pulling the rug on running jobs if this is the > >> root cause of the long running jobs in TripleO. > > > > For manual Jenkins restarts this is probably doable (if additional > > hassle), but I don't know whether that's something we can easily > > shoehorn into our orchestrated/automated restarts. > > > >> I think the "benefits are minimal" is bit of an overstatement. The > >> initial vision for TripleO CI stands and I would still like to see > >> individual projects entertain the option to use us in their gates. > > [...] > > > > This is what I'd like to delve deeper into. The current > > implementation isn't providing you with any mechanism to prevent > > changes which fail jobs running in the tripleo-test cloud from > > merging to your repos, is it? You're still having to manually > > inspect the job results posted by it? How is that particularly > > different from relying on third-party CI integration? > > > > As for other projects making use of the same jobs, right now the > > only convenience I'm aware of is that they can add check-tripleo > > pipeline jobs in our Zuul layout file instead of having you add it > > to yours (which could itself reside in a Git repo under your > > control, giving you even more flexibility over those choices). In > > fact, with a third-party CI using its own separate Gerrit account, > > you would be able to leave clear -1/+1 votes on check results which > > is not possible with the present solution. > > > > So anyway, I'm not saying that I definitely believe the third-party > > CI route will be better for TripleO, but I'm not (yet) clear on what > > tangible benefit you're receiving now that you lose by switching to > > that model. > > > > FWIW, I think third-party CI probably makes sense for TripleO. > Practically speaking we are third-party CI right now - we run our own > independent hardware infrastructure, we aren't multi-region, and we > can't leave a vote on changes. Since the first two aren't likely to > change any time soon (although I believe it's still a long-term goal to > get to a place where we can run in regular infra and just contribute our > existing CI hardware to the general infra pool, but that's still a long > way off), and moving to actual third-party CI would get us the ability > to vote, I think it's worth pursuing. > > As an added bit of fun, we have a forced move of our CI hardware coming > up in the relatively near future, and if we don't want to have multiple > days (and possibly more, depending on how the move goes) of TripleO CI > outage we're probably going to need to stand up a new environment in > parallel anyway. If we're doing that it might make sense to try hooking > it in through the third-party infra instead of the way we do it today. > Hopefully that would allow us to work out the kinks before the old > environment goes away. > > Anyway, I'm sure we'll need a bunch more discussion about this, but I > wanted to chime in with my two cents. > Do you have any ETA on when your outage would be? Is it before or after the summit in Austin? Personally, I'm going to attend a few TripleO design session where ever possible in Austin. It would be great to maybe have a fishbowl session about it. > -Ben > >
Re: [openstack-dev] [tripleo] becoming third party CI
On 03/17/2016 01:13 PM, Paul Belanger wrote: > On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote: >> On 03/10/2016 05:24 PM, Jeremy Stanley wrote: >>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: This seems to be the week people want to pile it on TripleO. Talking about upstream is great but I suppose I'd rather debate major changes after we branch Mitaka. :/ >>> [...] >>> >>> I didn't mean to pile on TripleO, nor did I intend to imply this was >>> something which should happen ASAP (or even necessarily at all), but >>> I do want to better understand what actual benefit is currently >>> derived from this implementation vs. a more typical third-party CI >>> (which lots of projects are doing when they find their testing needs >>> are not met by the constraints of our generic test infrastructure). >>> With regards to Jenkins restarts I think it is understood that our job times are long. How often do you find infra needs to restart Jenkins? >>> >>> We're restarting all 8 of our production Jenkins masters weekly at a >>> minimum, but generally more often when things are busy (2-3 times a >>> week). For many months we've been struggling with a thread leak for >>> which their development team has not seen as a priority to even >>> triage our bug report effectively. At this point I think we've >>> mostly given up on expecting it to be solved by anything other than >>> our upcoming migration off of Jenkins, but that's another topic >>> altogether. >>> And regardless of that what if we just said we didn't mind the destructiveness of losing a few jobs now and then (until our job times are under the line... say 1.5 hours or so). To be clear I'd be fine with infra pulling the rug on running jobs if this is the root cause of the long running jobs in TripleO. >>> >>> For manual Jenkins restarts this is probably doable (if additional >>> hassle), but I don't know whether that's something we can easily >>> shoehorn into our orchestrated/automated restarts. >>> I think the "benefits are minimal" is bit of an overstatement. The initial vision for TripleO CI stands and I would still like to see individual projects entertain the option to use us in their gates. >>> [...] >>> >>> This is what I'd like to delve deeper into. The current >>> implementation isn't providing you with any mechanism to prevent >>> changes which fail jobs running in the tripleo-test cloud from >>> merging to your repos, is it? You're still having to manually >>> inspect the job results posted by it? How is that particularly >>> different from relying on third-party CI integration? >>> >>> As for other projects making use of the same jobs, right now the >>> only convenience I'm aware of is that they can add check-tripleo >>> pipeline jobs in our Zuul layout file instead of having you add it >>> to yours (which could itself reside in a Git repo under your >>> control, giving you even more flexibility over those choices). In >>> fact, with a third-party CI using its own separate Gerrit account, >>> you would be able to leave clear -1/+1 votes on check results which >>> is not possible with the present solution. >>> >>> So anyway, I'm not saying that I definitely believe the third-party >>> CI route will be better for TripleO, but I'm not (yet) clear on what >>> tangible benefit you're receiving now that you lose by switching to >>> that model. >>> >> >> FWIW, I think third-party CI probably makes sense for TripleO. >> Practically speaking we are third-party CI right now - we run our own >> independent hardware infrastructure, we aren't multi-region, and we >> can't leave a vote on changes. Since the first two aren't likely to >> change any time soon (although I believe it's still a long-term goal to >> get to a place where we can run in regular infra and just contribute our >> existing CI hardware to the general infra pool, but that's still a long >> way off), and moving to actual third-party CI would get us the ability >> to vote, I think it's worth pursuing. >> >> As an added bit of fun, we have a forced move of our CI hardware coming >> up in the relatively near future, and if we don't want to have multiple >> days (and possibly more, depending on how the move goes) of TripleO CI >> outage we're probably going to need to stand up a new environment in >> parallel anyway. If we're doing that it might make sense to try hooking >> it in through the third-party infra instead of the way we do it today. >> Hopefully that would allow us to work out the kinks before the old >> environment goes away. >> >> Anyway, I'm sure we'll need a bunch more discussion about this, but I >> wanted to chime in with my two cents. >> > Do you have any ETA on when your outage would be? Is it before or after the > summit in Austin? > > Personally, I'm going to attend a few TripleO design session where ever > possible in Austin. It would be great to maybe have a fishbowl session about > it.
Re: [openstack-dev] [tripleo] becoming third party CI
On 03/10/2016 05:24 PM, Jeremy Stanley wrote: > On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: >> This seems to be the week people want to pile it on TripleO. Talking >> about upstream is great but I suppose I'd rather debate major changes >> after we branch Mitaka. :/ > [...] > > I didn't mean to pile on TripleO, nor did I intend to imply this was > something which should happen ASAP (or even necessarily at all), but > I do want to better understand what actual benefit is currently > derived from this implementation vs. a more typical third-party CI > (which lots of projects are doing when they find their testing needs > are not met by the constraints of our generic test infrastructure). > >> With regards to Jenkins restarts I think it is understood that our job >> times are long. How often do you find infra needs to restart Jenkins? > > We're restarting all 8 of our production Jenkins masters weekly at a > minimum, but generally more often when things are busy (2-3 times a > week). For many months we've been struggling with a thread leak for > which their development team has not seen as a priority to even > triage our bug report effectively. At this point I think we've > mostly given up on expecting it to be solved by anything other than > our upcoming migration off of Jenkins, but that's another topic > altogether. > >> And regardless of that what if we just said we didn't mind the >> destructiveness of losing a few jobs now and then (until our job >> times are under the line... say 1.5 hours or so). To be clear I'd >> be fine with infra pulling the rug on running jobs if this is the >> root cause of the long running jobs in TripleO. > > For manual Jenkins restarts this is probably doable (if additional > hassle), but I don't know whether that's something we can easily > shoehorn into our orchestrated/automated restarts. > >> I think the "benefits are minimal" is bit of an overstatement. The >> initial vision for TripleO CI stands and I would still like to see >> individual projects entertain the option to use us in their gates. > [...] > > This is what I'd like to delve deeper into. The current > implementation isn't providing you with any mechanism to prevent > changes which fail jobs running in the tripleo-test cloud from > merging to your repos, is it? You're still having to manually > inspect the job results posted by it? How is that particularly > different from relying on third-party CI integration? > > As for other projects making use of the same jobs, right now the > only convenience I'm aware of is that they can add check-tripleo > pipeline jobs in our Zuul layout file instead of having you add it > to yours (which could itself reside in a Git repo under your > control, giving you even more flexibility over those choices). In > fact, with a third-party CI using its own separate Gerrit account, > you would be able to leave clear -1/+1 votes on check results which > is not possible with the present solution. > > So anyway, I'm not saying that I definitely believe the third-party > CI route will be better for TripleO, but I'm not (yet) clear on what > tangible benefit you're receiving now that you lose by switching to > that model. > FWIW, I think third-party CI probably makes sense for TripleO. Practically speaking we are third-party CI right now - we run our own independent hardware infrastructure, we aren't multi-region, and we can't leave a vote on changes. Since the first two aren't likely to change any time soon (although I believe it's still a long-term goal to get to a place where we can run in regular infra and just contribute our existing CI hardware to the general infra pool, but that's still a long way off), and moving to actual third-party CI would get us the ability to vote, I think it's worth pursuing. As an added bit of fun, we have a forced move of our CI hardware coming up in the relatively near future, and if we don't want to have multiple days (and possibly more, depending on how the move goes) of TripleO CI outage we're probably going to need to stand up a new environment in parallel anyway. If we're doing that it might make sense to try hooking it in through the third-party infra instead of the way we do it today. Hopefully that would allow us to work out the kinks before the old environment goes away. Anyway, I'm sure we'll need a bunch more discussion about this, but I wanted to chime in with my two cents. -Ben __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] becoming third party CI
On Thu, Mar 17, 2016 at 01:55:24PM -0500, Ben Nemec wrote: > On 03/17/2016 01:13 PM, Paul Belanger wrote: > > On Thu, Mar 17, 2016 at 11:59:22AM -0500, Ben Nemec wrote: > >> On 03/10/2016 05:24 PM, Jeremy Stanley wrote: > >>> On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: > This seems to be the week people want to pile it on TripleO. Talking > about upstream is great but I suppose I'd rather debate major changes > after we branch Mitaka. :/ > >>> [...] > >>> > >>> I didn't mean to pile on TripleO, nor did I intend to imply this was > >>> something which should happen ASAP (or even necessarily at all), but > >>> I do want to better understand what actual benefit is currently > >>> derived from this implementation vs. a more typical third-party CI > >>> (which lots of projects are doing when they find their testing needs > >>> are not met by the constraints of our generic test infrastructure). > >>> > With regards to Jenkins restarts I think it is understood that our job > times are long. How often do you find infra needs to restart Jenkins? > >>> > >>> We're restarting all 8 of our production Jenkins masters weekly at a > >>> minimum, but generally more often when things are busy (2-3 times a > >>> week). For many months we've been struggling with a thread leak for > >>> which their development team has not seen as a priority to even > >>> triage our bug report effectively. At this point I think we've > >>> mostly given up on expecting it to be solved by anything other than > >>> our upcoming migration off of Jenkins, but that's another topic > >>> altogether. > >>> > And regardless of that what if we just said we didn't mind the > destructiveness of losing a few jobs now and then (until our job > times are under the line... say 1.5 hours or so). To be clear I'd > be fine with infra pulling the rug on running jobs if this is the > root cause of the long running jobs in TripleO. > >>> > >>> For manual Jenkins restarts this is probably doable (if additional > >>> hassle), but I don't know whether that's something we can easily > >>> shoehorn into our orchestrated/automated restarts. > >>> > I think the "benefits are minimal" is bit of an overstatement. The > initial vision for TripleO CI stands and I would still like to see > individual projects entertain the option to use us in their gates. > >>> [...] > >>> > >>> This is what I'd like to delve deeper into. The current > >>> implementation isn't providing you with any mechanism to prevent > >>> changes which fail jobs running in the tripleo-test cloud from > >>> merging to your repos, is it? You're still having to manually > >>> inspect the job results posted by it? How is that particularly > >>> different from relying on third-party CI integration? > >>> > >>> As for other projects making use of the same jobs, right now the > >>> only convenience I'm aware of is that they can add check-tripleo > >>> pipeline jobs in our Zuul layout file instead of having you add it > >>> to yours (which could itself reside in a Git repo under your > >>> control, giving you even more flexibility over those choices). In > >>> fact, with a third-party CI using its own separate Gerrit account, > >>> you would be able to leave clear -1/+1 votes on check results which > >>> is not possible with the present solution. > >>> > >>> So anyway, I'm not saying that I definitely believe the third-party > >>> CI route will be better for TripleO, but I'm not (yet) clear on what > >>> tangible benefit you're receiving now that you lose by switching to > >>> that model. > >>> > >> > >> FWIW, I think third-party CI probably makes sense for TripleO. > >> Practically speaking we are third-party CI right now - we run our own > >> independent hardware infrastructure, we aren't multi-region, and we > >> can't leave a vote on changes. Since the first two aren't likely to > >> change any time soon (although I believe it's still a long-term goal to > >> get to a place where we can run in regular infra and just contribute our > >> existing CI hardware to the general infra pool, but that's still a long > >> way off), and moving to actual third-party CI would get us the ability > >> to vote, I think it's worth pursuing. > >> > >> As an added bit of fun, we have a forced move of our CI hardware coming > >> up in the relatively near future, and if we don't want to have multiple > >> days (and possibly more, depending on how the move goes) of TripleO CI > >> outage we're probably going to need to stand up a new environment in > >> parallel anyway. If we're doing that it might make sense to try hooking > >> it in through the third-party infra instead of the way we do it today. > >> Hopefully that would allow us to work out the kinks before the old > >> environment goes away. > >> > >> Anyway, I'm sure we'll need a bunch more discussion about this, but I > >> wanted to chime in with my two cents. > >> > > Do you have any ETA on when
Re: [openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
On 2016-03-10 16:09:44 -0500 (-0500), Dan Prince wrote: > This seems to be the week people want to pile it on TripleO. Talking > about upstream is great but I suppose I'd rather debate major changes > after we branch Mitaka. :/ [...] I didn't mean to pile on TripleO, nor did I intend to imply this was something which should happen ASAP (or even necessarily at all), but I do want to better understand what actual benefit is currently derived from this implementation vs. a more typical third-party CI (which lots of projects are doing when they find their testing needs are not met by the constraints of our generic test infrastructure). > With regards to Jenkins restarts I think it is understood that our job > times are long. How often do you find infra needs to restart Jenkins? We're restarting all 8 of our production Jenkins masters weekly at a minimum, but generally more often when things are busy (2-3 times a week). For many months we've been struggling with a thread leak for which their development team has not seen as a priority to even triage our bug report effectively. At this point I think we've mostly given up on expecting it to be solved by anything other than our upcoming migration off of Jenkins, but that's another topic altogether. > And regardless of that what if we just said we didn't mind the > destructiveness of losing a few jobs now and then (until our job > times are under the line... say 1.5 hours or so). To be clear I'd > be fine with infra pulling the rug on running jobs if this is the > root cause of the long running jobs in TripleO. For manual Jenkins restarts this is probably doable (if additional hassle), but I don't know whether that's something we can easily shoehorn into our orchestrated/automated restarts. > I think the "benefits are minimal" is bit of an overstatement. The > initial vision for TripleO CI stands and I would still like to see > individual projects entertain the option to use us in their gates. [...] This is what I'd like to delve deeper into. The current implementation isn't providing you with any mechanism to prevent changes which fail jobs running in the tripleo-test cloud from merging to your repos, is it? You're still having to manually inspect the job results posted by it? How is that particularly different from relying on third-party CI integration? As for other projects making use of the same jobs, right now the only convenience I'm aware of is that they can add check-tripleo pipeline jobs in our Zuul layout file instead of having you add it to yours (which could itself reside in a Git repo under your control, giving you even more flexibility over those choices). In fact, with a third-party CI using its own separate Gerrit account, you would be able to leave clear -1/+1 votes on check results which is not possible with the present solution. So anyway, I'm not saying that I definitely believe the third-party CI route will be better for TripleO, but I'm not (yet) clear on what tangible benefit you're receiving now that you lose by switching to that model. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
On Thu, 2016-03-10 at 13:45 -0500, Paul Belanger wrote: > On Thu, Mar 10, 2016 at 05:32:15PM +, Jeremy Stanley wrote: > > > > On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote: > > [...] > > > > > > OpenStack Infra provides an easy way to plug CI systems and some > > > CIs (Nova, Neutron, Cinder, etc) already gate on some third party > > > systems. I was wondering if we would not be interested to > > > investigate this area and maybe ask to our third party drivers > > > contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to > > > run on their own hardware TripleO CI jobs running their specific > > > environment to enable the drivers. This CI would be plugged to > > > TripleO CI, and would provide awesome feedback. > > [...] > > > > It's also worth broadening the discussion to reassess whether the > > existing TripleO CI should itself follow our third-party > > integration > > model instead of the current implementation relying on our main > > community Zuul/Nodepool/Jenkins servers. When this was first > > implemented, there was a promise of adding more regions for > > robustness and of being able to use the surplus resources > > maintained > > in the TripleO CI clouds to augment our generic CI workload. It's > > been years now and these things have not really come to pass; if > > anything, that system and its operators are still struggling to > > keep > > a single region up and operational and providing enough resources > > to > > handle the current TripleO test load. > > > > The majority of unplanned whole-provider outages we've experienced > > in Nodepool have been from the TripleO cloud going completely > > offline (sometimes for a week or more straight), by far the > > longest-running jobs we have are running there (which substantially > > hampers our ability to do things like gracefully restart our > > Jenkins > > masters without aborting running jobs), and ultimately the benefits > > to TripleO for this model are very minimal anyway (different > > pipelines means the jobs aren't effectively even voting, much less > > gating). > > > > I'm not trying to slam the TripleO cloud operators, I think they're > > doing an amazing job given the limitations they're working under > > and > > much of their work has provided inspiration for our Infra-Cloud > > project too. They're helpful and responsive and a joy to > > collaborate > > with, but ultimately I think TripleO might actually realize more > > benefit from adding a Zuul/Nodepool/Jenkins of their own to this > > (we've massively streamlined the Puppet for maintaining these > > recently and have very thorough deployment and operational > > documentation) rather than dealing with the issues which arise from > > being half-integrated into one they don't control. > > > > I've been meaning to bring that up for discussion for a while, just > > keep forgetting, but this thread seems like a good segue into the > > topic. > I tend to agree here, I think a lot of great work has been done to > allow new 3rd > party CI system to come online. Especially considering we have the > puppet-openstackci[1] module. > > However, I would also like to see tripleO move more inline with our > existing CI > tooling, if possible. I know that wouldn't happen over night, but > would at > least give better insight into how the CI is working. TripleO uses more of OpenStack tooling than just about any project I know of. We do have some unique requirements related to the fact that our CI actually PXE boots instances in a cloud. Something like this: http://blog.nemebean.com/tags/quintupleo We have plans on the table to potentially split our Heat stack (or make it more configurable) such that we can test the configuration side on normal cloud instances. I'm all for the effort in this and it would get us closer to what I think you are talking about is "normal". Like it our not our CI does catch things that nobody else is catching. Quirky deployment things happen and until someone gets nested virt working on commodity cloud servers (well) I think we have a case for our own CI cloud. > > Now that we have the infracloud too, it might be worth talking about > doing the > samething with TripleO hardware, again if possible. There is likely > corner cases > where it wouldn't work, but would be interesting to talk about it. The corner case would be does it allow us to PXE boot an instance (thus allowing provisioning via Ironic, etc.). We could certainly entertain the option of creating our own OVB cloud and managing it alongside of infracloud if that is what you are asking. I don't think it is the best fit for TripleO today given our unique requirements at the moment. Dan > > [1] https://github.com/openstack-infra/puppet-openstackci > > _ > _ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubs >
Re: [openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
This seems to be the week people want to pile it on TripleO. Talking about upstream is great but I suppose I'd rather debate major changes after we branch Mitaka. :/ Anyways, might as well get into it now. replies inline On Thu, 2016-03-10 at 17:32 +, Jeremy Stanley wrote: > On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote: > [...] > > > > OpenStack Infra provides an easy way to plug CI systems and some > > CIs (Nova, Neutron, Cinder, etc) already gate on some third party > > systems. I was wondering if we would not be interested to > > investigate this area and maybe ask to our third party drivers > > contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to > > run on their own hardware TripleO CI jobs running their specific > > environment to enable the drivers. This CI would be plugged to > > TripleO CI, and would provide awesome feedback. > [...] > > It's also worth broadening the discussion to reassess whether the > existing TripleO CI should itself follow our third-party integration > model instead of the current implementation relying on our main > community Zuul/Nodepool/Jenkins servers. When this was first > implemented, there was a promise of adding more regions for > robustness and of being able to use the surplus resources maintained > in the TripleO CI clouds to augment our generic CI workload. It's > been years now and these things have not really come to pass; if > anything, that system and its operators are still struggling to keep > a single region up and operational and providing enough resources to > handle the current TripleO test load. Yeah. We actually lost a region of hardware this last year too. I think there is a distinction between our cloud being up and trunk being broken. Now we've had some troubles with both over the last couple years but in general I think our CI cloud (which provides instances) has been up 98 maybe even 99% of the time. To be honest I've not been tracking our actual uptime for bragging rights but I think the actual cloud (which is connected to nodepool) has a good uptime. We have been dealing with a lot of trunk breakages however. This is something that occurs because we are not a gate... and it is related to the fact that we have limited resources, and a long job wall time. So taking a step away from the common infrastructure pipelines which do act as an upstream gate would likely only make this worse for us. To be fair the last outage you refer to occurred over the course of days because we made a config change only to discover the breakage days later (because nodepool caches the keystone endpoints). We are learning and we do timebox our systems administration a bit more than most pure administrators but I think the general uptime of our cloud has been good. > > The majority of unplanned whole-provider outages we've experienced > in Nodepool have been from the TripleO cloud going completely > offline (sometimes for a week or more straight), by far the > longest-running jobs we have are running there (which substantially > hampers our ability to do things like gracefully restart our Jenkins > masters without aborting running jobs), and ultimately the benefits > to TripleO for this model are very minimal anyway (different > pipelines means the jobs aren't effectively even voting, much less > gating). With regards to Jenkins restarts I think it is understood that our job times are long. How often do you find infra needs to restart Jenkins? And regardless of that what if we just said we didn't mind the destructiveness of losing a few jobs now and then (until our job times are under the line... say 1.5 hours or so). To be clear I'd be fine with infra pulling the rug on running jobs if this is the root cause of the long running jobs in TripleO. I think the "benefits are minimal" is bit of an overstatement. The initial vision for TripleO CI stands and I would still like to see individual projects entertain the option to use us in their gates. Perhaps the strongest community influences are within Heat, Ironic, and Puppet. The ability to manage the interaction with Heat, Ironic, and Puppet in the common infrastructure is a clear benefit and there are members of these communities that I think would agree. > > I'm not trying to slam the TripleO cloud operators, I think they're > doing an amazing job given the limitations they're working under and > much of their work has provided inspiration for our Infra-Cloud > project too. They're helpful and responsive and a joy to collaborate > with, but ultimately I think TripleO might actually realize more > benefit from adding a Zuul/Nodepool/Jenkins of their own to this > (we've massively streamlined the Puppet for maintaining these > recently and have very thorough deployment and operational > documentation) rather than dealing with the issues which arise from > being half-integrated into one they don't control. We've actually move most of our daily management tasks for TripleO into
Re: [openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
On Thu, Mar 10, 2016 at 05:32:15PM +, Jeremy Stanley wrote: > On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote: > [...] > > OpenStack Infra provides an easy way to plug CI systems and some > > CIs (Nova, Neutron, Cinder, etc) already gate on some third party > > systems. I was wondering if we would not be interested to > > investigate this area and maybe ask to our third party drivers > > contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to > > run on their own hardware TripleO CI jobs running their specific > > environment to enable the drivers. This CI would be plugged to > > TripleO CI, and would provide awesome feedback. > [...] > > It's also worth broadening the discussion to reassess whether the > existing TripleO CI should itself follow our third-party integration > model instead of the current implementation relying on our main > community Zuul/Nodepool/Jenkins servers. When this was first > implemented, there was a promise of adding more regions for > robustness and of being able to use the surplus resources maintained > in the TripleO CI clouds to augment our generic CI workload. It's > been years now and these things have not really come to pass; if > anything, that system and its operators are still struggling to keep > a single region up and operational and providing enough resources to > handle the current TripleO test load. > > The majority of unplanned whole-provider outages we've experienced > in Nodepool have been from the TripleO cloud going completely > offline (sometimes for a week or more straight), by far the > longest-running jobs we have are running there (which substantially > hampers our ability to do things like gracefully restart our Jenkins > masters without aborting running jobs), and ultimately the benefits > to TripleO for this model are very minimal anyway (different > pipelines means the jobs aren't effectively even voting, much less > gating). > > I'm not trying to slam the TripleO cloud operators, I think they're > doing an amazing job given the limitations they're working under and > much of their work has provided inspiration for our Infra-Cloud > project too. They're helpful and responsive and a joy to collaborate > with, but ultimately I think TripleO might actually realize more > benefit from adding a Zuul/Nodepool/Jenkins of their own to this > (we've massively streamlined the Puppet for maintaining these > recently and have very thorough deployment and operational > documentation) rather than dealing with the issues which arise from > being half-integrated into one they don't control. > > I've been meaning to bring that up for discussion for a while, just > keep forgetting, but this thread seems like a good segue into the > topic. I tend to agree here, I think a lot of great work has been done to allow new 3rd party CI system to come online. Especially considering we have the puppet-openstackci[1] module. However, I would also like to see tripleO move more inline with our existing CI tooling, if possible. I know that wouldn't happen over night, but would at least give better insight into how the CI is working. Now that we have the infracloud too, it might be worth talking about doing the samething with TripleO hardware, again if possible. There is likely corner cases where it wouldn't work, but would be interesting to talk about it. [1] https://github.com/openstack-infra/puppet-openstackci __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] becoming third party CI (was: enabling third party CI)
On 2016-03-10 09:50:03 -0500 (-0500), Emilien Macchi wrote: [...] > OpenStack Infra provides an easy way to plug CI systems and some > CIs (Nova, Neutron, Cinder, etc) already gate on some third party > systems. I was wondering if we would not be interested to > investigate this area and maybe ask to our third party drivers > contributors (Bigswitch, Nuage, Midonet, Cisco, Netapp, etc) to > run on their own hardware TripleO CI jobs running their specific > environment to enable the drivers. This CI would be plugged to > TripleO CI, and would provide awesome feedback. [...] It's also worth broadening the discussion to reassess whether the existing TripleO CI should itself follow our third-party integration model instead of the current implementation relying on our main community Zuul/Nodepool/Jenkins servers. When this was first implemented, there was a promise of adding more regions for robustness and of being able to use the surplus resources maintained in the TripleO CI clouds to augment our generic CI workload. It's been years now and these things have not really come to pass; if anything, that system and its operators are still struggling to keep a single region up and operational and providing enough resources to handle the current TripleO test load. The majority of unplanned whole-provider outages we've experienced in Nodepool have been from the TripleO cloud going completely offline (sometimes for a week or more straight), by far the longest-running jobs we have are running there (which substantially hampers our ability to do things like gracefully restart our Jenkins masters without aborting running jobs), and ultimately the benefits to TripleO for this model are very minimal anyway (different pipelines means the jobs aren't effectively even voting, much less gating). I'm not trying to slam the TripleO cloud operators, I think they're doing an amazing job given the limitations they're working under and much of their work has provided inspiration for our Infra-Cloud project too. They're helpful and responsive and a joy to collaborate with, but ultimately I think TripleO might actually realize more benefit from adding a Zuul/Nodepool/Jenkins of their own to this (we've massively streamlined the Puppet for maintaining these recently and have very thorough deployment and operational documentation) rather than dealing with the issues which arise from being half-integrated into one they don't control. I've been meaning to bring that up for discussion for a while, just keep forgetting, but this thread seems like a good segue into the topic. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev