Digging this up again. Havent moved to a physical server yet. I will first be splitting the bbot server and postgres on different hosts.
What I've noticed the CPU spiking is always (and I rea) due to queries like: SELECT logchunks.first_line, logchunks.last_line, logchunks.content, logchunks.compressed FROM logchunks WHERE logchunks.logid = 56343 ORDER BY logchunks.first_line If I am correct that call can only come from CompressLog <https://github.com/buildbot/buildbot/blob/cfe60e31206748bcf01232eb13c0d6177cd04d5d/master/buildbot/db/logs.py#L252> . They do take a very long time (even 1 minute). While running all the system is stuck. I wonder if I should use some custom postgres config, I am simply running the official postgres 9.5 container. Thanks, Francesco On Mon, Dec 19, 2016 at 4:06 PM, Francesco Di Mizio < [email protected]> wrote: > Pierre, > > I will have IT move the server to a real physical box. Will be using a > speedy SSD so if that's really the problem it should be evident soon. > Will of course update here with my findings. > > Thanks! > > On Fri, Dec 16, 2016 at 12:59 PM, Pierre Tardy <[email protected]> wrote: > >> Hi Francesco, >> I am continuing to do my scalability benchmarking with 0.9.2, and one >> issue I am finding is that buildbot consumes lots of memory in order to >> compress the logs. >> It actually loads the whole log in memory in order to compress it at once >> This simplifies the algorithm, but can be quite memory hungry. On my >> current test, I am doing 300 builds in parallel with 10k lines of log, and >> it eats 4GB+ of memory. >> This explains why Greg had issue with 32bit python, and could explain >> your performance issue, if your master starts swapping while doing the >> compressions. >> >> This is probably something that I want to eventually fix. >> Pierre >> >> >> Le mer. 14 déc. 2016 à 19:12, Greg MacDonald <[email protected]> >> a écrit : >> >>> Might not be related but I had some issues with logs and UI lag when I >>> was using 32 bit python for the server. Went away when I upgraded to 64 >>> bit. - Greg >>> >>> >>> >>> *From:* users [mailto:[email protected]] *On Behalf Of *Pierre >>> Tardy >>> *Sent:* Wednesday, December 14, 2016 10:48 AM >>> *To:* Francesco Di Mizio <[email protected]> >>> *Cc:* Buildbot <[email protected]> >>> *Subject:* Re: [[email protected]] buildbot CPU usage >>> >>> >>> >>> Hi Francesco, >>> >>> >>> >>> I have tried to reproduce your issue by doing some stress tests on the >>> buildbot master. >>> >>> I have been able to run thousands of builds throwing 30k lines of logs >>> each without significant lag in the UI. >>> >>> it takes ~2min to compress the whole log with sqlite, but I can always >>> get smooth access to the UI. >>> >>> >>> >>> I also tried with pg, and it was much faster. >>> >>> >>> >>> I am setting up a cleaner test environment, and I will be publishing >>> some benchmarks. >>> >>> >>> >>> This week we got this bug about performance issue: >>> http://trac.buildbot.net/ticket/3654#comment:4 >>> >>> This eventually resolved to be a hardware problem on the machine. Could >>> this be something like this? >>> >>> >>> >>> Pierre >>> >>> >>> >>> Le mar. 13 déc. 2016 à 01:26, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> Pierre i am not using SQLite but pg hosted in an other container on the >>> same box. I'd say the issue isn't which dB engine i am running as it also >>> happens with MySQL. >>> >>> >>> >>> Thanks, >>> >>> Francesco >>> >>> On Dec 13, 2016 12:40 AM, "Pierre Tardy" <[email protected]> wrote: >>> >>> Hi Francesco, >>> >>> I would say it is normal that the CPU goes to 100% while the log are >>> being compressed. As soon as anything else is not blocked, for me this is >>> fine. >>> >>> I can see you are using sqlite. The thing is with sqlite the number of >>> parallel access to the db is restricted to one by default in order to >>> protect against weird parallel behaviour we got with sqlite. >>> >>> I think this explains your behaviour. >>> >>> I would recommend as an easy fix to switch to pg or mysql. I'll try to >>> think of some workarounds like compressing the log only when there is a low >>> load average. This is already something that we though about with my >>> teammates. Log compression is important, but not urgent, so this is fine to >>> delay until there is not so much to do. >>> >>> >>> >>> Pierre >>> >>> >>> >>> >>> >>> >>> >>> Le lun. 12 déc. 2016 à 12:40, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> Hey Pierre, >>> >>> >>> >>> first of all thanks for taking a look. I never ever have more than 9-10 >>> builds going on at the same time. Like I said many times the system can >>> happily be sitting on a nice 10-15% CPU usage when even 8 builds are >>> running, but the CPU spikes can happen at any time with any number of >>> builds going on. Things seem to go worse when the system runs for a few >>> days with no reboot. Just now I had 7 builds running and the cpu keeps >>> oscillating between 10% and 120%, now it's contantly over 100%. I wonder >>> if we can find a way for you to access the system so you can take a look >>> yourself (of course if you are fine with it)? >>> >>> >>> >>> I will def be looking at multimaster in the future, just have no time >>> atm. >>> >>> >>> >>> Thanks again, >>> >>> Francesco >>> >>> On Mon, Dec 12, 2016 at 11:18 AM, Pierre Tardy <[email protected]> wrote: >>> >>> Hi Francesco, >>> >>> I finally got the time to look at your profile. >>> >>> >>> >>> I dont see anything other than normal load. Buildbot is hardworking at >>> compressing the log in the db, managing the build and steps, etc. >>> >>> How many build in parallel do you have for your master? Maybe its time >>> to start thinking about multimaster. >>> >>> >>> >>> Regards, >>> >>> Pierre >>> >>> Le lun. 28 nov. 2016 à 11:30, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> Reviving this old thread. According to things that were said on this >>> thread, I reduced as much as I could all the logs produced by all steps. >>> Bigger has now 300 lines. >>> >>> Despite that I am still seeing cpu spiked. I managed to grab a hopefully >>> useful json from the profiler which you'll find attached. Toards the end >>> you can clearly see the load goes through the roof. >>> >>> >>> >>> If anybody could take a look I'd really appreciate! >>> >>> >>> >>> Thanks, >>> >>> Francesco >>> >>> On Fri, Aug 26, 2016 at 6:06 PM, Neil Gilmore <[email protected]> >>> wrote: >>> >>> Hi Pierre, >>> >>> Well, my previous message was really saying that I think buildbot can >>> handle logs a lot bigger then 12K lines. >>> >>> It seems unlikely that it's stuck on log compression. The step says it's >>> been running for 9.5 hours, and previous runs of the entire build have >>> durations in the 1-3 hour range. The particular step is a ShellCommand that >>> actually does the build. The place where the log stops is during an svn up >>> operation. >>> >>> I don't find any exceptions in the log for this, though that's not >>> definitive (current number of log files is 122, and sometimes finding >>> things is tricky). I do see the build cancellation in the log, and it also >>> shows up in the UI. >>> >>> I'm betting on a lost deferred. So far, I've never seen an exception >>> relating to that. >>> >>> On the bright side, this seems to be the only worker whose builders are >>> stuck (some of the rest are waiting for locks, which this build has). >>> Usually, there's more. >>> >>> Neil Gilmore >>> grammatech.com >>> >>> >>> >>> On 8/26/2016 10:35 AM, Pierre Tardy wrote: >>> >>> Hum, >>> >>> It might not be stuck, actually, but just spending very long time to >>> compress the log. >>> >>> >>> >>> In theory, the log compression is not waited for, though. >>> >>> >>> >>> Note that if you stop a build that is waiting for a lost deferred, this >>> will have no effect as you describe. >>> >>> >>> >>> You got absolutly no exception in twisted.log? >>> >>> >>> >>> Pierre >>> >>> >>> >>> >>> >>> Le ven. 26 août 2016 à 17:06, Neil Gilmore <[email protected]> a >>> écrit : >>> >>> I'm currently looking at a step with 3 logs: >>> 81531 lines >>> 489285 lines >>> 489311 lines >>> >>> An earlier successful run would have that first log at 244080 lines. >>> >>> This particular build is stuck, though. :( (which is why I'm looking at >>> it.) (A bit off-topic, but I tried stopping it. The last step is marked >>> cancelled, but that's the only effect.) >>> >>> >>> >>> Neil Gilmore >>> grammatech.com >>> >>> >>> >>> On 8/26/2016 6:25 AM, Pierre Tardy wrote: >>> >>> >>> >>> The ram looks like sufficient, it might be a good test to try and >>> increase the number of cpu for that VM. >>> >>> In your trace, I can see the use of up to 7 threads at the same time, so >>> you might gain by going 8 CPUs >>> >>> >>> >>> Also, make sure that your VM host is not overbooked. In my experience of >>> using VMware VMs provided by IT, overbooking has been a source of >>> inexplicable performance issues. >>> >>> >>> >>> 12k lines is a lot, but buildbot shall support this kind of load without >>> issue. >>> >>> >>> >>> Le ven. 26 août 2016 à 12:05, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> It's a vmware virtual machine with 4 GIGs RAM and 4 CPUs@3Ghz. It runs, >>> among other marginal things, 2 docker containers - one for the buildbot and >>> one for the postgres db. >>> >>> >>> >>> The most beefy logs have around 12K lines. Is it too much? >>> >>> Also some other logs are read from the worker's filesystem and added as >>> additional logs. >>> >>> >>> >>> On Fri, Aug 26, 2016 at 11:51 AM, Pierre Tardy <[email protected]> wrote: >>> >>> Cool! >>> >>> I can indeed see 3 spikes. >>> >>> >>> >>> Looks related to logs and logs compression. >>> >>> >>> >>> What is the HW spec of your master machine? >>> >>> How much log does your build generate? >>> >>> >>> >>> Pierre >>> >>> >>> >>> Le ven. 26 août 2016 à 11:42, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> Pierre, >>> >>> >>> >>> I enabled it, waited 1 min and saw the spike, then stopped after a few >>> secs. Attached the json. >>> >>> Awesome tool btw, work wonders! >>> >>> >>> >>> On Thu, Aug 25, 2016 at 1:14 PM, Pierre Tardy <[email protected]> wrote: >>> >>> You can try to hit the button in the morning with a two hours gather >>> period, and hope that you see the spike during that period.. >>> >>> >>> >>> >>> >>> Le jeu. 25 août 2016 à 12:17, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> Thanks a lot! Pierre I will def will give it a shot. I am not sure I'll >>> be able to smash that 'start recording' button as the UI isusuallystuck >>> when the CPU spikes. Updates to come! >>> >>> >>> >>> On Thu, Aug 25, 2016 at 10:45 AM, Pierre Tardy <[email protected]> wrote: >>> >>> Hi Francesco, >>> >>> >>> >>> I spent some time in order to implement a profiler plugin for buildbot >>> >>> >>> >>> You can give it a look, and send your profile.json file if you need more >>> analysis from me. >>> >>> https://github.com/tardyp/buildbot_profiler >>> >>> >>> >>> Regards, >>> >>> Pierre >>> >>> >>> >>> >>> >>> Le mer. 24 août 2016 à 22:43, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>> I've tried and it's not an easy task because of my Win into Vagrant into >>> Docker setup. >>> >>> I'll try again soon when I get a Linux box! >>> >>> >>> >>> On Fri, Aug 19, 2016 at 5:54 PM, Vasily <[email protected]> wrote: >>> >>> Umm, no. VTune has Python support starting 2017 Beta, and, well, it was >>> my team (at Intel) work actually :-) >>> >>> P.S. I'm from Intel, too. >>> >>> Thanks, >>> Vasily >>> >>> 19 авг. 2016 г. 18:17 пользователь "Francesco Di Mizio" < >>> [email protected]> написал: >>> >>> >>> >>> I had thought you were making fun of Intel somehow ;) >>> >>> >>> >>> On Aug 19, 2016 5:07 PM, "Pierre Tardy" <[email protected]> wrote: >>> >>> ahah >>> >>> >>> >>> I though this was a taunt on me being employed by Intel. >>> >>> I actually had mitigated experience with vtune few years ago, and didn't >>> know they had python support until then. >>> >>> Being an opensource guy, I usually neglegate to look at proprietary >>> stuff. >>> >>> >>> >>> Pierre >>> >>> >>> >>> Le ven. 19 août 2016 à 12:18, Vasily <[email protected]> a écrit : >>> >>> I'm again suggesting to look into Python profiling capabilities of >>> Intel® VTune™ Amplifier. It could run statistical profiling for a long time >>> and display CPU usage over time, so the developer can look at specific time >>> range where CPU usage was too high and see which functions were executed. >>> >>> Thanks, >>> Vasily >>> >>> 19 авг. 2016 г. 11:57 пользователь "Pierre Tardy" <[email protected]> >>> написал: >>> >>> >>> >>> Hi Francesco, >>> >>> >>> >>> Your described setup looks sane to me. >>> >>> >>> >>> The problems we are trying to catch are cpu spikes, as far as I >>> understand, which does not happen for very long, but are very annoying for >>> users, as it is blocking the reactor. >>> >>> >>> >>> This problem is not easy to see in the profile you sent, as this profile >>> is over long time, so we see the average of each method during the day and >>> not the spikes. >>> >>> >>> >>> What would really be needed is a on-demand profiler which would detect >>> cpu spikes and only log the stack traces during those times. >>> >>> >>> >>> Here is a nice blog pst explaining why statistic profiling is cool and >>> easy to implement in python. >>> >>> https://nylas.com/blog/performance >>> >>> >>> >>> For 0.9.1 I want to concentrate on scalability, and write a debugging ui >>> plugin based on those ideas (and probably code) >>> >>> >>> >>> That would be great if your team can help on that matter. >>> >>> >>> >>> Regards, >>> >>> Pierre >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> >>> users mailing list >>> >>> [email protected] >>> >>> https://lists.buildbot.net/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://lists.buildbot.net/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://lists.buildbot.net/mailman/listinfo/users >>> >>> >
_______________________________________________ users mailing list [email protected] https://lists.buildbot.net/mailman/listinfo/users
