Hi everyone,

The last time we were here, my boss has just added something to keep the old builds in the database down to 100K. That's worked pretty well for our UI master, as in I haven't had to restart it since. Naturally, one of our other masters is acting up.

This master is the one that probably does the most work and runs most of the builds. In the past, it's been prone to losing its wamp connection. When this happens, we get nothing in the logs. The symptom is that none of the workers for this master appear in the UI, and its builders don't appear in the builders page. So our users conclude that the master isn't running (which isn't true). You can still see the builds running on the front page, and get to the builder's page from there, but if you needed to force a build on a builder that isn't currently building, you're out of luck. Unless, of course, you go through the REST API and find the builder's number.

Because of the previous problem, I've had a session running top for a couple weeks, and I may have more data. Eventually, that master shows a 100% (or more) CPU usage for some minutes (or maybe an hour or more). My theory is that the wamp connection isn't serviced during that time, and disconnects. When things settle down, the connection is already gone, and isn't reconnected. That's my current situation. The CPU was spiked when I looked, the log wasn't getting new messages, and the UI wasn't showing the workers. When the CPU came back down, the log resumed, but the master's workers aren't in the UI.

As one might expect, builds do not proceed well when the master is spiking the CPU.

During one of these times when the CPU is spiked, the log only gets new messages every several minutes (instead of a pretty close to continuous flow). I attached gdb to the master, figuring I couldn't make things worse. Unfortunately, I don't think that system is quite set up for debugging. For example, py-bt, while it did run, gave back nothing useful. A straight bt gave the usual string of python fame evaluations, etc. I didn't have time to go further then. But I did note that the backtrace showed something similar to what we saw when processing millions of builds. That is that every time I broke to see where I was, I was down in some regex stuff.

Neil Gilmore
grammatech.com
_______________________________________________
users mailing list
users@buildbot.net
https://lists.buildbot.net/mailman/listinfo/users

Reply via email to