Hi Pierre,
We've had notify_on_missing configured for some time -- certainly it was
present when I restarted the masters on Tues. I also restarted the
master that has those workers. Shouldn't all the workers in master.cfg
have attached then?
This could cause us some trouble. Here's how we start workers:
We have a builder consisting of ShellCommands that does an ssh login
into the worker machine, sees if the worker is running, and runs it if
it is not. This allows us to make certain that all the workers that are
in master.cfg also have matching worker processes on the worker
machines. This builder runs every hour. It is entirely possible that a
worker machine without a worker process could sit for more than an hour
before having its worker process started.
What you guys seem to be telling me is that if I were to stop a worker
process and let more than an hour go by, that worker would never, ever
have its builds run. Even though the worker is attached, and the builds
are queued.
That sounds pretty bad to me. Am I understanding correctly? Or can I
wait until a worker process is running, then reconfigure without that
worker in master.cfg(and getting unkown worker errors, I suppose), then
reconfiguring with the worker back in master.cfg, so that it attempts to
attach before the timeout?
Let me emphasize that the thing that brought this to my attention was
adding a new worker and its builders to master.cfg. The worker process
would have been started sometime after that by the builder that starts
worker processes.
Neil Gilmore
grammatech.com
On 2/3/2017 3:17 PM, Pierre Tardy wrote:
Hi Neil,
The timer starts when the worker is first configured.
but only if notify_on_missing is configured.
that may be a reason why you do not see the bug for ancient workers
Pierre
Le ven. 3 févr. 2017 à 21:59, Neil Gilmore <[email protected]
<mailto:[email protected]>> a écrit :
Hi Andrej,
Thanks for the reply.
I don't see missing_timeout in our master.cfg anywhere. But I do
see this:
c['workers'] = [Worker(host, '<password>',
notify_on_missing=bots_email[host]) for host in bots_list]
Let's see if I understood you. The default missing_timeout is 60
minutes. If I start the master and wait 60 minutes, then start the
worker, the worker won't attach?
In our case, we're not even adding the worker to master.cfg until well
after that 60 minutes (a couple days after). We're adding new workers.
Do you figure this could be the same problem?
What happens with a default notify_on_missing? I figure I can try the
patch in your PR when we restart the masters.
Neil Gilmore
[email protected] <mailto:[email protected]>
On 2/3/2017 2:42 PM, Andrej Rode wrote:
> Hi Neil,
>
>> 2017-02-03T12:39:09-0500 [Broker,28906,10.233.216.43] worker
'<name>'
>> attaching from IPv4Address(TCP, '<ip>', 35642)
>> 2017-02-03T12:39:09-0500 [Broker,28906,10.233.216.43] Got
workerinfo
>> from '<name>'
>> 2017-02-03T12:39:09-0500 [-] bot attached
>> 2017-02-03T12:39:09-0500 [-] worker <name> cannot attach
>> Traceback (most recent call last):
>> Failure: twisted.internet.error.AlreadyCalled: Tried
to cancel
>> an already-called event.
> I had the same problembs but with a single-master setup. By any
chance
> are you using a non-default `missing_timeout` and/or
`notify_on_missing`
> on your workers?
>
> For my issue I've a PR up [0] and now I can detach and attach
workers
> as I like. But it is still not clear why we even run into
problems here.
>
> I figured out that attaching a worker after longer than
> `missing_timeout` after a master start results in this problem on my
> setup. (Default `missing_timeout` is 60 minutes.)
>
> Cheers,
> Andrej
>
> [0] https://github.com/buildbot/buildbot/pull/2708
> _______________________________________________
> users mailing list
> [email protected] <mailto:[email protected]>
> https://lists.buildbot.net/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected] <mailto:[email protected]>
https://lists.buildbot.net/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://lists.buildbot.net/mailman/listinfo/users