+100 (apologies for top-post - laptop was stolen, limping along on Windows for a few days. Ew.)
> -----Original Message----- > From: Robert Collins [mailto:[email protected]] > Sent: Wednesday, January 08, 2014 11:52 PM > To: James E. Blair > Cc: <[email protected]> > Subject: Re: [OpenStack-Infra] An idea to scale Zuul > > Looks good to me. > > On 9 January 2014 19:59, James E. Blair <[email protected]> wrote: > > Hi, > > > > When Zuul gets very busy, it can end up launching hundreds of jobs > > nearly simultaneously. Each of them has to perform several git fetch > > operations to obtain the changes needed for testing. They fetch from > > the git repos on the Zuul server because Zuul itself is creating those > > commits by locally merging several changes together according to > > what's in the queue. > > > > The acts of fetching and merging git patchsets (which is single > > threaded) adds some load to the server, but in particular, serving > > those git refs to 400 Jenkins nodes nearly simultaneously can also be > > a bit of a burden. It was too much for our previous server; we've > > moved Zuul to a faster server now, but it would be nice to have a more > > scalable solution for the future. > > > > I'd like to move the Zuul git merging component into a separate > > process that can be located on a separate host (or hosts) and scaled out. > > > > The current zuul-server would continue to manage the queue and launch > > jobs, but as it processes the queue and decides which changes should > > be composed and built into zuul git refs, it would package the info > > about each ref and put it on the gearman queue as a work item. An > > instance of the new component (zuul-merger) would fetch that job and > > fetch the needed refs from Gerrit, and merge them. It would also > > serve the resulting git repo in the same way that Zuul does now. > > > > Zuul would not have to wait for a response before continuing to > > process the queue, and since it's not doing any actual work, will be > > able to move through the queue _much_ faster than currently. Once > > Zuul _does_ receive a completion response from a zuul-merger, it can > > then launch the jobs for that change. It will pass the URL for that > > particular zuul-merger (as ZUUL_URL) to the jobs so that they know > > from which merger to fetch the zuul ref. We can also use the cancel > > job functionality in gearman if Zuul decides to reorder the queue. > > > > We can scale out the mergers horizontally and they can operate in > > parallel, which should also improve the responsiveness of overall > > queue processing. > > > > The only downside I currently foresee is that if we scale out the > > mergers too much, we will see a performance impact on gerrit; > > therefore we should anticipate having a reasonably small number of > > these (2-8, perhaps). > > > > Since this is already quite modular, I think the implementation should > > be relatively simple. > > > > How does that sound? > > > > -Jim > > > > _______________________________________________ > > OpenStack-Infra mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > > > -- > Robert Collins <[email protected]> > Distinguished Technologist > HP Converged Cloud > > _______________________________________________ > OpenStack-Infra mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
