[users@bb.net] More anecdotes from the multi-master trenches.

Neil Gilmore Thu, 01 Dec 2016 08:52:30 -0800

Hi everyone,

More anecdotes. Scheduler related.

As you may recall, I attempted to fix our problem with collapsingbehavior by appending the master name to each scheduler (except theforce schedulers). The idea was to force the schedulers to be masterspecific so that the builder and scheduler would be on the same master,allowing the default (mostly, we changed the behavior to not regardrevision as significant) collapsing behavior to work. This appears tohave at least mostly worked.

But there's always a snag, isn't there? A few days ago, we switched thebranch that most of our work is on. The way we have out master.cfg setup, this is a one line change. But it changes nearly everything. Itchanges builder names, scheduler names, etc.

Now I'm seeing some odd anomalies. Such as builds being scheduled byschedulers that no longer exist on any master, and are not in ourmaster.cfg, but are still in the database.

I am also seeing builders in current schedulers that never seem to getbuilds in their queues. We have to force them to see anything happen.


And builders with builds in their queues that never seem to start.

Could this be part of the result of schedulers not being particularlyreconfigurable?

And on that note, there seems to be 3 schemes in 0.9.x forcheckConfig/reconfigService.

Number 1 is how the schedulers do it. Which is that they don't, but havelargish __init__() functions.

Number 2 is how the workers do it. checkconfig looks a lot like __init__might, and reconfigService looks a lot like checkConfig, except that itdoesn't except.

Number 3 is how things like reporters do it. checkConfig only doeschecks (and the occasional null-ish initialization), and reconfigServicecopies its arguments into itself.

Which is the proper way, since I'm likely to have a go at updating theschedulers? Number 1 is right out. Number 2 is pretty easy, mostlymoving the __init__ to checkConfig, and mostly copying toreconfigService, and making sure to call base classes methods properly.


One slightly happier anecdote...

We ended up with a situation where there were 2 builders for aparticular worker. Both had current builds marked as acquiring locks(remember that we use locks to keep it to one build per worker, exceptfor a special builder that should always run, even if there's anotherbuild running. That's why we don't restrict builds at the worker level).

I did manage to go in through the manhole and release the lock fromwhoever was holding it. By the time I got far enough to do that, Iwasn't interested in figuring out which build was actually holding onto it.

The first builder's build completed, and the second builder picked upafter that.


Yay.

As always, thanks for your assistance.

Neil Gilmore
grammatech.com


_______________________________________________
users mailing list
users@buildbot.net
https://lists.buildbot.net/mailman/listinfo/users

[users@bb.net] More anecdotes from the multi-master trenches.

Reply via email to