Pierre, I will have IT move the server to a real physical box. Will be using a speedy SSD so if that's really the problem it should be evident soon. Will of course update here with my findings.
Thanks! On Fri, Dec 16, 2016 at 12:59 PM, Pierre Tardy <[email protected]> wrote: > Hi Francesco, > I am continuing to do my scalability benchmarking with 0.9.2, and one > issue I am finding is that buildbot consumes lots of memory in order to > compress the logs. > It actually loads the whole log in memory in order to compress it at once > This simplifies the algorithm, but can be quite memory hungry. On my > current test, I am doing 300 builds in parallel with 10k lines of log, and > it eats 4GB+ of memory. > This explains why Greg had issue with 32bit python, and could explain your > performance issue, if your master starts swapping while doing the > compressions. > > This is probably something that I want to eventually fix. > Pierre > > > Le mer. 14 déc. 2016 à 19:12, Greg MacDonald <[email protected]> > a écrit : > >> Might not be related but I had some issues with logs and UI lag when I >> was using 32 bit python for the server. Went away when I upgraded to 64 >> bit. - Greg >> >> >> >> *From:* users [mailto:[email protected]] *On Behalf Of *Pierre >> Tardy >> *Sent:* Wednesday, December 14, 2016 10:48 AM >> *To:* Francesco Di Mizio <[email protected]> >> *Cc:* Buildbot <[email protected]> >> *Subject:* Re: [[email protected]] buildbot CPU usage >> >> >> >> Hi Francesco, >> >> >> >> I have tried to reproduce your issue by doing some stress tests on the >> buildbot master. >> >> I have been able to run thousands of builds throwing 30k lines of logs >> each without significant lag in the UI. >> >> it takes ~2min to compress the whole log with sqlite, but I can always >> get smooth access to the UI. >> >> >> >> I also tried with pg, and it was much faster. >> >> >> >> I am setting up a cleaner test environment, and I will be publishing some >> benchmarks. >> >> >> >> This week we got this bug about performance issue: http://trac.buildbot. >> net/ticket/3654#comment:4 >> >> This eventually resolved to be a hardware problem on the machine. Could >> this be something like this? >> >> >> >> Pierre >> >> >> >> Le mar. 13 déc. 2016 à 01:26, Francesco Di Mizio < >> [email protected]> a écrit : >> >> Pierre i am not using SQLite but pg hosted in an other container on the >> same box. I'd say the issue isn't which dB engine i am running as it also >> happens with MySQL. >> >> >> >> Thanks, >> >> Francesco >> >> On Dec 13, 2016 12:40 AM, "Pierre Tardy" <[email protected]> wrote: >> >> Hi Francesco, >> >> I would say it is normal that the CPU goes to 100% while the log are >> being compressed. As soon as anything else is not blocked, for me this is >> fine. >> >> I can see you are using sqlite. The thing is with sqlite the number of >> parallel access to the db is restricted to one by default in order to >> protect against weird parallel behaviour we got with sqlite. >> >> I think this explains your behaviour. >> >> I would recommend as an easy fix to switch to pg or mysql. I'll try to >> think of some workarounds like compressing the log only when there is a low >> load average. This is already something that we though about with my >> teammates. Log compression is important, but not urgent, so this is fine to >> delay until there is not so much to do. >> >> >> >> Pierre >> >> >> >> >> >> >> >> Le lun. 12 déc. 2016 à 12:40, Francesco Di Mizio < >> [email protected]> a écrit : >> >> Hey Pierre, >> >> >> >> first of all thanks for taking a look. I never ever have more than 9-10 >> builds going on at the same time. Like I said many times the system can >> happily be sitting on a nice 10-15% CPU usage when even 8 builds are >> running, but the CPU spikes can happen at any time with any number of >> builds going on. Things seem to go worse when the system runs for a few >> days with no reboot. Just now I had 7 builds running and the cpu keeps >> oscillating between 10% and 120%, now it's contantly over 100%. I wonder >> if we can find a way for you to access the system so you can take a look >> yourself (of course if you are fine with it)? >> >> >> >> I will def be looking at multimaster in the future, just have no time atm. >> >> >> >> Thanks again, >> >> Francesco >> >> On Mon, Dec 12, 2016 at 11:18 AM, Pierre Tardy <[email protected]> wrote: >> >> Hi Francesco, >> >> I finally got the time to look at your profile. >> >> >> >> I dont see anything other than normal load. Buildbot is hardworking at >> compressing the log in the db, managing the build and steps, etc. >> >> How many build in parallel do you have for your master? Maybe its time to >> start thinking about multimaster. >> >> >> >> Regards, >> >> Pierre >> >> Le lun. 28 nov. 2016 à 11:30, Francesco Di Mizio < >> [email protected]> a écrit : >> >> Reviving this old thread. According to things that were said on this >> thread, I reduced as much as I could all the logs produced by all steps. >> Bigger has now 300 lines. >> >> Despite that I am still seeing cpu spiked. I managed to grab a hopefully >> useful json from the profiler which you'll find attached. Toards the end >> you can clearly see the load goes through the roof. >> >> >> >> If anybody could take a look I'd really appreciate! >> >> >> >> Thanks, >> >> Francesco >> >> On Fri, Aug 26, 2016 at 6:06 PM, Neil Gilmore <[email protected]> >> wrote: >> >> Hi Pierre, >> >> Well, my previous message was really saying that I think buildbot can >> handle logs a lot bigger then 12K lines. >> >> It seems unlikely that it's stuck on log compression. The step says it's >> been running for 9.5 hours, and previous runs of the entire build have >> durations in the 1-3 hour range. The particular step is a ShellCommand that >> actually does the build. The place where the log stops is during an svn up >> operation. >> >> I don't find any exceptions in the log for this, though that's not >> definitive (current number of log files is 122, and sometimes finding >> things is tricky). I do see the build cancellation in the log, and it also >> shows up in the UI. >> >> I'm betting on a lost deferred. So far, I've never seen an exception >> relating to that. >> >> On the bright side, this seems to be the only worker whose builders are >> stuck (some of the rest are waiting for locks, which this build has). >> Usually, there's more. >> >> Neil Gilmore >> grammatech.com >> >> >> >> On 8/26/2016 10:35 AM, Pierre Tardy wrote: >> >> Hum, >> >> It might not be stuck, actually, but just spending very long time to >> compress the log. >> >> >> >> In theory, the log compression is not waited for, though. >> >> >> >> Note that if you stop a build that is waiting for a lost deferred, this >> will have no effect as you describe. >> >> >> >> You got absolutly no exception in twisted.log? >> >> >> >> Pierre >> >> >> >> >> >> Le ven. 26 août 2016 à 17:06, Neil Gilmore <[email protected]> a >> écrit : >> >> I'm currently looking at a step with 3 logs: >> 81531 lines >> 489285 lines >> 489311 lines >> >> An earlier successful run would have that first log at 244080 lines. >> >> This particular build is stuck, though. :( (which is why I'm looking at >> it.) (A bit off-topic, but I tried stopping it. The last step is marked >> cancelled, but that's the only effect.) >> >> >> >> Neil Gilmore >> grammatech.com >> >> >> >> On 8/26/2016 6:25 AM, Pierre Tardy wrote: >> >> >> >> The ram looks like sufficient, it might be a good test to try and >> increase the number of cpu for that VM. >> >> In your trace, I can see the use of up to 7 threads at the same time, so >> you might gain by going 8 CPUs >> >> >> >> Also, make sure that your VM host is not overbooked. In my experience of >> using VMware VMs provided by IT, overbooking has been a source of >> inexplicable performance issues. >> >> >> >> 12k lines is a lot, but buildbot shall support this kind of load without >> issue. >> >> >> >> Le ven. 26 août 2016 à 12:05, Francesco Di Mizio < >> [email protected]> a écrit : >> >> It's a vmware virtual machine with 4 GIGs RAM and 4 CPUs@3Ghz. It runs, >> among other marginal things, 2 docker containers - one for the buildbot and >> one for the postgres db. >> >> >> >> The most beefy logs have around 12K lines. Is it too much? >> >> Also some other logs are read from the worker's filesystem and added as >> additional logs. >> >> >> >> On Fri, Aug 26, 2016 at 11:51 AM, Pierre Tardy <[email protected]> wrote: >> >> Cool! >> >> I can indeed see 3 spikes. >> >> >> >> Looks related to logs and logs compression. >> >> >> >> What is the HW spec of your master machine? >> >> How much log does your build generate? >> >> >> >> Pierre >> >> >> >> Le ven. 26 août 2016 à 11:42, Francesco Di Mizio < >> [email protected]> a écrit : >> >> Pierre, >> >> >> >> I enabled it, waited 1 min and saw the spike, then stopped after a few >> secs. Attached the json. >> >> Awesome tool btw, work wonders! >> >> >> >> On Thu, Aug 25, 2016 at 1:14 PM, Pierre Tardy <[email protected]> wrote: >> >> You can try to hit the button in the morning with a two hours gather >> period, and hope that you see the spike during that period.. >> >> >> >> >> >> Le jeu. 25 août 2016 à 12:17, Francesco Di Mizio < >> [email protected]> a écrit : >> >> Thanks a lot! Pierre I will def will give it a shot. I am not sure I'll >> be able to smash that 'start recording' button as the UI isusuallystuck >> when the CPU spikes. Updates to come! >> >> >> >> On Thu, Aug 25, 2016 at 10:45 AM, Pierre Tardy <[email protected]> wrote: >> >> Hi Francesco, >> >> >> >> I spent some time in order to implement a profiler plugin for buildbot >> >> >> >> You can give it a look, and send your profile.json file if you need more >> analysis from me. >> >> https://github.com/tardyp/buildbot_profiler >> >> >> >> Regards, >> >> Pierre >> >> >> >> >> >> Le mer. 24 août 2016 à 22:43, Francesco Di Mizio < >> [email protected]> a écrit : >> >> I've tried and it's not an easy task because of my Win into Vagrant into >> Docker setup. >> >> I'll try again soon when I get a Linux box! >> >> >> >> On Fri, Aug 19, 2016 at 5:54 PM, Vasily <[email protected]> wrote: >> >> Umm, no. VTune has Python support starting 2017 Beta, and, well, it was >> my team (at Intel) work actually :-) >> >> P.S. I'm from Intel, too. >> >> Thanks, >> Vasily >> >> 19 авг. 2016 г. 18:17 пользователь "Francesco Di Mizio" < >> [email protected]> написал: >> >> >> >> I had thought you were making fun of Intel somehow ;) >> >> >> >> On Aug 19, 2016 5:07 PM, "Pierre Tardy" <[email protected]> wrote: >> >> ahah >> >> >> >> I though this was a taunt on me being employed by Intel. >> >> I actually had mitigated experience with vtune few years ago, and didn't >> know they had python support until then. >> >> Being an opensource guy, I usually neglegate to look at proprietary stuff. >> >> >> >> Pierre >> >> >> >> Le ven. 19 août 2016 à 12:18, Vasily <[email protected]> a écrit : >> >> I'm again suggesting to look into Python profiling capabilities of Intel® >> VTune™ Amplifier. It could run statistical profiling for a long time and >> display CPU usage over time, so the developer can look at specific time >> range where CPU usage was too high and see which functions were executed. >> >> Thanks, >> Vasily >> >> 19 авг. 2016 г. 11:57 пользователь "Pierre Tardy" <[email protected]> >> написал: >> >> >> >> Hi Francesco, >> >> >> >> Your described setup looks sane to me. >> >> >> >> The problems we are trying to catch are cpu spikes, as far as I >> understand, which does not happen for very long, but are very annoying for >> users, as it is blocking the reactor. >> >> >> >> This problem is not easy to see in the profile you sent, as this profile >> is over long time, so we see the average of each method during the day and >> not the spikes. >> >> >> >> What would really be needed is a on-demand profiler which would detect >> cpu spikes and only log the stack traces during those times. >> >> >> >> Here is a nice blog pst explaining why statistic profiling is cool and >> easy to implement in python. >> >> https://nylas.com/blog/performance >> >> >> >> For 0.9.1 I want to concentrate on scalability, and write a debugging ui >> plugin based on those ideas (and probably code) >> >> >> >> That would be great if your team can help on that matter. >> >> >> >> Regards, >> >> Pierre >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> users mailing list >> >> [email protected] >> >> https://lists.buildbot.net/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://lists.buildbot.net/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> [email protected] >> https://lists.buildbot.net/mailman/listinfo/users >> >>
_______________________________________________ users mailing list [email protected] https://lists.buildbot.net/mailman/listinfo/users
