Petr Štetiar kirjoitti 19.3.2021 klo 8.39:
Hannu Nyman <[email protected]> [2021-03-18 19:52:18]:

Petr Štetiar kirjoitti 18.3.2021 klo 12.12:
I'm still not that happy with the round-robin scheduler[1], but it's
better then the previous state, so I'm going to deploy it soon to all
masters.

...

1. https://github.com/buildbot/buildbot/issues/4592#issuecomment-801163587
I noticed that the master packages buildbot just started a new
mips64_octeonplus build, but only removed one of the pending build requests
(9 hours old) from the queue. The newer buildrequest 1344 that is 41 minutes
old, is still in the queue.

https://buildbot.openwrt.org/master/packages/#/builders/4
https://buildbot.openwrt.org/master/packages/#/pendingbuildrequests

The same seems to have happened to i386_pentium-mmx (while I was writing this).

So, a started build for a target does not always clear the build request
queue, as intended.
it looks like the issue in the scheduler/database update I've referenced and 
reported:

  2021-03-18 17:32:12+0000 [-] prioritizeBuilders:    mips64_octeonplus 
complete_at: 2021-03-16 12:50:30+00:00
  2021-03-18 17:32:13+0000 [-] starting build <Build mips64_octeonplus number:None 
results:success> using worker <WorkerForBuilder builder='mips64_octeonplus' 
worker='fsf-dock-22' state=AVAILABLE>
  2021-03-18 17:32:18+0000 [-] starting build <Build mips64_octeonplus number:7 
results:success>.. pinging the worker <WorkerForBuilder builder='mips64_octeonplus' 
worker='fsf-dock-22' state=BUILDING>
  2021-03-18 17:56:30+0000 [-] prioritizeBuilders:    mips64_octeonplus 
complete_at: 2021-03-16 12:50:30+00:00
  2021-03-19 00:23:21+0000 [-]  <Build mips64_octeonplus number:7 
results:success>: build finished

here previous build finishes, so the next complete_at should return time of
00:23:21, but it actually still returns the old timestamp:

  2021-03-19 00:23:22+0000 [-] prioritizeBuilders:    mips64_octeonplus 
complete_at: 2021-03-16 12:50:30+00:00

so the build is considered oldest and scheduled for build:

  2021-03-19 00:23:24+0000 [-] starting build <Build mips64_octeonplus number:None 
results:success> using worker <WorkerForBuilder builder='mips64_octeonplus' 
worker='fsf-dock-22' state=AVAILABLE>
  2021-03-19 00:23:31+0000 [-] starting build <Build mips64_octeonplus number:8 
results:success>.. pinging the worker <WorkerForBuilder builder='mips64_octeonplus' 
worker='fsf-dock-22' state=BUILDING>

Cheers,

Petr


I think that this might the problem that rjarry tried to overcome with "cooldown_seconds" defined and set to 4 seconds in the discussion in the upstream buildbot issue you referenced. I think that he made the queue evaluation to wait for 4 seconds before actually starting, so that all asynchronous updates would have been written first.  (I didn't look into logic too deeply, but that was my impression at the first glance.)

We might try something similar, even set a bit longer waiting time.




_______________________________________________
openwrt-devel mailing list
[email protected]
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to