Hey, Sounds good to me :-).
A more immediate option of reducing the load on zuul could be to run a duplicate zuul with only the check pipeline. That is, run a zuul per a pipeline. In fact, we could essentially distribute independent pipelines (but I realise that part would require a bit of refactoring).
Cheers, Josh Rackspace Australia On 1/9/14 2:59 PM, James E. Blair wrote:
Hi, When Zuul gets very busy, it can end up launching hundreds of jobs nearly simultaneously. Each of them has to perform several git fetch operations to obtain the changes needed for testing. They fetch from the git repos on the Zuul server because Zuul itself is creating those commits by locally merging several changes together according to what's in the queue. The acts of fetching and merging git patchsets (which is single threaded) adds some load to the server, but in particular, serving those git refs to 400 Jenkins nodes nearly simultaneously can also be a bit of a burden. It was too much for our previous server; we've moved Zuul to a faster server now, but it would be nice to have a more scalable solution for the future. I'd like to move the Zuul git merging component into a separate process that can be located on a separate host (or hosts) and scaled out. The current zuul-server would continue to manage the queue and launch jobs, but as it processes the queue and decides which changes should be composed and built into zuul git refs, it would package the info about each ref and put it on the gearman queue as a work item. An instance of the new component (zuul-merger) would fetch that job and fetch the needed refs from Gerrit, and merge them. It would also serve the resulting git repo in the same way that Zuul does now. Zuul would not have to wait for a response before continuing to process the queue, and since it's not doing any actual work, will be able to move through the queue _much_ faster than currently. Once Zuul _does_ receive a completion response from a zuul-merger, it can then launch the jobs for that change. It will pass the URL for that particular zuul-merger (as ZUUL_URL) to the jobs so that they know from which merger to fetch the zuul ref. We can also use the cancel job functionality in gearman if Zuul decides to reorder the queue. We can scale out the mergers horizontally and they can operate in parallel, which should also improve the responsiveness of overall queue processing. The only downside I currently foresee is that if we scale out the mergers too much, we will see a performance impact on gerrit; therefore we should anticipate having a reasonably small number of these (2-8, perhaps). Since this is already quite modular, I think the implementation should be relatively simple. How does that sound? -Jim _______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
_______________________________________________ OpenStack-Infra mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
