Hi everyone,

More anecdotes. Scheduler related.

As you may recall, I attempted to fix our problem with collapsing behavior by appending the master name to each scheduler (except the force schedulers). The idea was to force the schedulers to be master specific so that the builder and scheduler would be on the same master, allowing the default (mostly, we changed the behavior to not regard revision as significant) collapsing behavior to work. This appears to have at least mostly worked.

But there's always a snag, isn't there? A few days ago, we switched the branch that most of our work is on. The way we have out master.cfg set up, this is a one line change. But it changes nearly everything. It changes builder names, scheduler names, etc.

Now I'm seeing some odd anomalies. Such as builds being scheduled by schedulers that no longer exist on any master, and are not in our master.cfg, but are still in the database.

I am also seeing builders in current schedulers that never seem to get builds in their queues. We have to force them to see anything happen.

And builders with builds in their queues that never seem to start.

Could this be part of the result of schedulers not being particularly reconfigurable?

And on that note, there seems to be 3 schemes in 0.9.x for checkConfig/reconfigService.

Number 1 is how the schedulers do it. Which is that they don't, but have largish __init__() functions.

Number 2 is how the workers do it. checkconfig looks a lot like __init__ might, and reconfigService looks a lot like checkConfig, except that it doesn't except.

Number 3 is how things like reporters do it. checkConfig only does checks (and the occasional null-ish initialization), and reconfigService copies its arguments into itself.

Which is the proper way, since I'm likely to have a go at updating the schedulers? Number 1 is right out. Number 2 is pretty easy, mostly moving the __init__ to checkConfig, and mostly copying to reconfigService, and making sure to call base classes methods properly.

One slightly happier anecdote...

We ended up with a situation where there were 2 builders for a particular worker. Both had current builds marked as acquiring locks (remember that we use locks to keep it to one build per worker, except for a special builder that should always run, even if there's another build running. That's why we don't restrict builds at the worker level).

I did manage to go in through the manhole and release the lock from whoever was holding it. By the time I got far enough to do that, I wasn't interested in figuring out which build was actually holding onto it.

The first builder's build completed, and the second builder picked up after that.


As always, thanks for your assistance.

Neil Gilmore

users mailing list

Reply via email to