Gave a try to the profiler option that comes with twistd twistd --nodaemon --profile=statsObj --profiler=profile -y ./buildbot.tac
it does not seem to work. Not sure it's bbot or twisted itself. On Tue, Aug 9, 2016 at 10:16 AM, Francesco Di Mizio < [email protected]> wrote: > > 2016-08-08 23:55:01+0000 [-] P4 poll failed on > atx-p4-buildproxy.rsi.global:1666, //starcitizen/ > Traceback (most recent call last): > File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/base.py", > line 65, in doPoll > d = defer.maybeDeferred(self.poll) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", > line 150, in maybeDeferred > result = f(*args, **kw) > File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", > line 162, in poll > d = self._poll() > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", > line 1274, in unwindGenerator > return _inlineCallbacks(None, gen, Deferred()) > --- <exception caught here> --- > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", > line 1128, in _inlineCallbacks > result = g.send(result) > File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", > line 232, in _poll > result = yield self._get_process_output(args) > File "/usr/local/lib/python2.7/dist-packages/buildbot/changes/p4poller.py", > line 170, in _get_process_output > d = utils.getProcessOutput(self.p4bin, args, env) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py", > line 128, in getProcessOutput > reactor) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/utils.py", > line 28, in _callProtocolWithDeferred > reactor.spawnProcess(p, executable, (executable,)+tuple(args), env, > path) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/posixbase.py", > line 340, in spawnProcess > processProtocol, uid, gid, childFDs) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py", > line 731, in __init__ > self._fork(path, uid, gid, executable, args, environment, fdmap=fdmap) > File "/usr/local/lib/python2.7/dist-packages/twisted/internet/process.py", > line 405, in _fork > self.pid = os.fork() > exceptions.OSError: [Errno 12] Cannot allocate memory > > I'll run one day with the p4 poller disabled and see how it goes. > > On Tue, Aug 2, 2016 at 7:28 PM, Francesco Di Mizio < > [email protected]> wrote: > >> Just one. Here is what the poller looks like >> >> s = changes.P4Source( >> p4port=config.p4_server, >> p4user=config.p4_user, >> p4passwd=config.p4_password, >> p4base='//XXXXXXX/', >> pollInterval=10, >> pollAtLaunch = False, >> split_file=lambda branchfile: branchfile.split('/',1), >> encoding='cp437' >> ) >> >> >> >> >> >> On Tue, Aug 2, 2016 at 7:24 PM, Pierre Tardy <[email protected]> wrote: >> >>> How many projects are your pulling? I'll see if I can make a PoC of a >>> builder which runs statprof >>> >>> Le mar. 2 août 2016 à 18:53, Francesco Di Mizio < >>> [email protected]> a écrit : >>> >>>> Thanks for the kind replies both of you. >>>> >>>> @Pierre: >>>> Not sure I get what you mean. Given the context, for a step to be CPU >>>> demanding it should be a master side step right? I happen to not have any. >>>> What would you be profiling with statprof? >>>> I'd really appreciate if you could elaborate on your idea. >>>> >>>> Really all I can think of is the poller. I'll keep looking into it. >>>> >>>> >>>> >>>> On Tue, Aug 2, 2016 at 6:36 PM, Dan Kegel <[email protected]> wrote: >>>> >>>>> With gitpoller, it was easy to see; whenever the number of git >>>>> sessions from the poller went over 0 or so, web gui performance was >>>>> poor. >>>>> And if it went over 10, well, you could kiss the gui goodbye for >>>>> several minutes. >>>>> >>>>> One countermeasure was to randomize the polling intervals, a la >>>>> >>>>> interval=6 # minutes >>>>> self['change_source'].append( >>>>> # Fuzz the interval to avoid slamming the git server >>>>> and hitting the MaxStartups or MaxSessions limits >>>>> # If you hit them, twistd.log will have lots of >>>>> "ssh_exchange_identification: Connection closed by remote host" errors >>>>> # See http://trac.buildbot.net/ticket/2480 >>>>> changes.GitPoller(repourl, branches=branchnames, >>>>> workdir='gitpoller-workdir-'+name, pollinterval=interval*60 + >>>>> random.uniform(-10, 10))) >>>>> >>>>> That made life just barely bearable, at least until number of projects >>>>> polled was under 50 or so. >>>>> What really helped was not using pollers anymore, and switching to >>>>> gitlab's webhooks. >>>>> We're at 190 now, of which 57 are still using gitpoller, and it's >>>>> almost ok. (I really have >>>>> to move the last 57 onto gitlab. Or, well, since they're not >>>>> critical, increase the polling >>>>> interval...) >>>>> >>>>> On Tue, Aug 2, 2016 at 9:13 AM, Pierre Tardy <[email protected]> wrote: >>>>> > Hi, >>>>> > >>>>> > Pollers are usually indeed not scaling as they, hmm, poll. >>>>> > What you are describing here is hints that the twisted reactor >>>>> thread is >>>>> > always busy, which should not happen if you only start 10 builds. >>>>> > You might have some custom steps which are doing something heavily >>>>> cpu bound >>>>> > in the main thread. >>>>> > What I usually do is to use statprof: >>>>> > https://pypi.python.org/pypi/statprof/ >>>>> > >>>>> > in order to know what the cpu is doing. >>>>> > You could create a builder which you can trig whenever you need, and >>>>> which >>>>> > would start the profiling, wait a few minutes, and then save >>>>> profiling to a >>>>> > file. >>>>> > >>>>> > >>>>> > >>>>> > Le mar. 2 août 2016 à 17:53, Francesco Di Mizio < >>>>> [email protected]> >>>>> > a écrit : >>>>> >> >>>>> >> Hey Dan, >>>>> >> >>>>> >> I am using a p4 poller. Maybe it's suffering from the same problems? >>>>> >> >>>>> >> On Tue, Aug 2, 2016 at 5:45 PM, Francesco Di Mizio >>>>> >> <[email protected]> wrote: >>>>> >>> >>>>> >>> I'd like to provide a bit more context.Right after restarting the >>>>> master >>>>> >>> and kicking off 10 builds CPU was at 110-120%. This made the UI >>>>> unusable and >>>>> >>> basically all the services were stuck, including the REST API. >>>>> >>> After 3-4 minutes like this and WITH all the 10 builds still >>>>> running the >>>>> >>> CPU usage went down to 5%, stayed there for 5 minutes and all was >>>>> smooth and >>>>> >>> quick again. From then on it keps oscillating, I've seen spikes of >>>>> 240% :( >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Aug 2, 2016 at 4:12 PM, Francesco Di Mizio >>>>> >>> <[email protected]> wrote: >>>>> >>>> >>>>> >>>> Sometimes it goes up to 140%. I was not able to relate this with a >>>>> >>>> particular builds condition - seems like it can happen any time >>>>> and is not >>>>> >>>> related to how many builds are going on. >>>>> >>>> >>>>> >>>> I usually realize the server got into this state because the web >>>>> UI gets >>>>> >>>> stuck. As soon as the CPU% goes back to normal values (2-3% most >>>>> times) the >>>>> >>>> web finishes loading just instantly. >>>>> >>>> >>>>> >>>> Any pointers as to what might be causing this? Only reason I can >>>>> think >>>>> >>>> of is too many people trying to access the web UI simultaniously >>>>> - may I be >>>>> >>>> right? >>>>> >>>> >>>>> >>> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> users mailing list >>>>> >> [email protected] >>>>> >> https://lists.buildbot.net/mailman/listinfo/users >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > users mailing list >>>>> > [email protected] >>>>> > https://lists.buildbot.net/mailman/listinfo/users >>>>> >>>> >>>> >> >
_______________________________________________ users mailing list [email protected] https://lists.buildbot.net/mailman/listinfo/users
