Hello, We reached a point where we have too many build slaves and too many users to get good performance/latency out of our current buildbot waterfall and console view page.
We've been tweaking the page a lot lately to make it faster to load, but the number of new slaves every day, and the number of new users make it going slower faster than we can address the problem. This morning buildbot was completely unresponsive because it was not able to keep up with the demand. To make it come back to life, I disabled two features, which account for about 50% of all traffics : The buildbot chrome extensions, and the top 3 overview bars at the top of the waterfall. This will be able to make it stay online for a while, but this is not ideal. It's time to think of a better solution. The underlying problem with buildbot is the database format, which is just hundred of thousand of files on the harddrive, with no "seek" capability, and the fact that the webserver itself is single threaded. We currently have 63 slaves on our main waterfall. I think this is too many for what buildbot can really support. We would ideally need to split it. Q1: Want kind of split would you prefer? mac/linux/windows or chromium/webkit/modules or full/windows/mac/linux/memory, etc? the main buildbot page would most likely become a bunch of iframe to display all the slaves at the same time. The console view integration might be a little bit less nice. If there is anyone with web devel experience who wants to help, we could modify the current waterfall to fetch only json data from the buildbot, and merge them together, client side, to get a combined view. Q2: How many changes do we need to display on the console view? We are currently displaying the last 50 changes. Which is usually half-day. If people don't mind about this, we could scale back to 30. This would make it a little faster to load. Q3: What kind of auto-refresh do we need? We used to be at 60 secs for a long time, and I changed it a couple of weeks ago to 90 secs. No one complained, so I guess this is good. Should we go even higher than that? Q4: How much build history do we need? Right now stdio log are kept for 3 weeks and build results (green, red) are kept for 1 month. Older build results are archived but can't be accessed directly by the buildbot. If you have any other suggestions, please let me know! Some things that we can't do: - Get a better machine. It's already running on a dedicated dual quad core nehalem server with 24gb of RAM and 15k rpm drives. - Change buildbot to use non-single threaded web server. This is way too much involved. *WHAT I NEED YOU HELP WITH :* 1. No more scraping of the waterfall please! If you need to crawl the logs, let me know and I can run your script on the database directly. 2. If you know about apache mod_cache / mod_proxy, and wants to help, please let me know. build.chromium.org is a proxied cache of the real buildbot server, and the cache does not work well. This contribute to another got 25/30% of the overall load on the buildbot. Thanks! Nicolas --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---