Re: [RFC] Refactoring OpenWrt's build infra

2022-10-16 Thread Baptiste Jonglez
On 16-10-22, Christian Marangi wrote:
> On Sun, Oct 16, 2022 at 02:07:05PM +0200, Baptiste Jonglez wrote:
> > - either buildbot can run latent workers with a different Docker image
> >   depending on the build
> 
> IMHO, this would be the safest and better solution to the problem. But
> this means that we will have to support 2 thing instead of having one
> centrilized container.

I'm not even sure Buildbot is able to do that :)

But if it is, and the only change between worker images is the version of
the base image (e.g. Debian), then that sounds manageable.

> Would be ideal to have one centrilized dl/ dir where each runner can go
> and take the file. We already support that in openwrt (to have a
> different dl dir) and there isn't any problem with having different
> release tar for the same package.

I had tried to share dl/ across several worker containers on the same
physical machine.  But there are race conditions that make it not so easy
to do.  I fixed one issue [1] but there was another that I couldn't track down.

We could use object storage, for instance from DigitalOcean [2].  It would
allow all workers to read/write to the same shared storage for download
files, and get (hopefully) good download performance.  With that in place,
we could also prune dl/ very aggressively to save disk space.

Baptiste

[1] https://git.openwrt.org/d4c957f24b2f76986378c6d9
[2] https://www.digitalocean.com/products/spaces


signature.asc
Description: PGP signature
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [RFC] Refactoring OpenWrt's build infra

2022-10-16 Thread Christian Marangi
On Sun, Oct 16, 2022 at 02:07:05PM +0200, Baptiste Jonglez wrote:
> Hi,
> 
> On 05-10-22, Thibaut wrote:
> > Hi,
> > 
> > Following an earlier conversation on IRC with Petr, I’m willing to work on 
> > refactoring our buildbot setup as follows:
> > 
> > - single master for each stage (images and packages)
> > - latent workers attached to either master, thus able to build 
> > opportunistically from either master or release branches as needed / as 
> > work becomes available
> 
> This is a good idea, but I see one main downside: we would probably have
> to use the same buildbot worker image for all releases.
> 
> From what I remember, when the worker image was updated from Debian 9 to
> Debian 10, this seriously broke 19.07 builds.  Maybe Petr or Jow will
> remember the details better.
> 
> I see two ways to address this:
> 
> - either buildbot can run latent workers with a different Docker image
>   depending on the build
>

IMHO, this would be the safest and better solution to the problem. But
this means that we will have to support 2 thing instead of having one
centrilized container.

> - otherwise, we have to think early about the update strategy.  Maybe use
>   the shared buildbot instance for master branch + most recent release
>   only, and move older releases back to a dedicated buildbot instance?
> 
> > The main upside is that all buildslaves could be pooled, improving overall 
> > throughput and reducing wasted « idle time », thus lowering build times and 
> > operating costs.
> > 
> > Petr also suggested that extra release workers could be spawned at will 
> > (through e.g. cloud VMs) when a new release is to be tagged; tagged release 
> > could be scheduled only to release workers: this would still work within 
> > this « single master » build scheme.
> > 
> > NB: I’m aware of the potential performance penalty of having buildslaves 
> > randomly switching between branches, so I would try to come up with a 
> > reasonably smart solution to this issue if it doesn’t conflict with the 
> > main goals.
> 
> One thing to look for is disk space usage.  Full disks is a common cause
> of build failures.  If a single worker goes through builds for different
> branches, I would expect disk usage to be higher (e.g. more different
> versions of software in dl/).
> 

Would be ideal to have one centrilized dl/ dir where each runner can go
and take the file. We already support that in openwrt (to have a
different dl dir) and there isn't any problem with having different
release tar for the same package.

-- 
Ansuel

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [RFC] Refactoring OpenWrt's build infra

2022-10-16 Thread Baptiste Jonglez
Hi,

On 05-10-22, Thibaut wrote:
> Hi,
> 
> Following an earlier conversation on IRC with Petr, I’m willing to work on 
> refactoring our buildbot setup as follows:
> 
> - single master for each stage (images and packages)
> - latent workers attached to either master, thus able to build 
> opportunistically from either master or release branches as needed / as work 
> becomes available

This is a good idea, but I see one main downside: we would probably have
to use the same buildbot worker image for all releases.

From what I remember, when the worker image was updated from Debian 9 to
Debian 10, this seriously broke 19.07 builds.  Maybe Petr or Jow will
remember the details better.

I see two ways to address this:

- either buildbot can run latent workers with a different Docker image
  depending on the build

- otherwise, we have to think early about the update strategy.  Maybe use
  the shared buildbot instance for master branch + most recent release
  only, and move older releases back to a dedicated buildbot instance?

> The main upside is that all buildslaves could be pooled, improving overall 
> throughput and reducing wasted « idle time », thus lowering build times and 
> operating costs.
> 
> Petr also suggested that extra release workers could be spawned at will 
> (through e.g. cloud VMs) when a new release is to be tagged; tagged release 
> could be scheduled only to release workers: this would still work within this 
> « single master » build scheme.
> 
> NB: I’m aware of the potential performance penalty of having buildslaves 
> randomly switching between branches, so I would try to come up with a 
> reasonably smart solution to this issue if it doesn’t conflict with the main 
> goals.

One thing to look for is disk space usage.  Full disks is a common cause
of build failures.  If a single worker goes through builds for different
branches, I would expect disk usage to be higher (e.g. more different
versions of software in dl/).

Thanks,
Baptiste


signature.asc
Description: PGP signature
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [RFC] Refactoring OpenWrt's build infra

2022-10-10 Thread Petr Štetiar
Thibaut  [2022-10-05 17:56:17]:

[adding Jo and Paul to Cc: loop]

Hi,

> Before I set on to revamp the system accordingly I want to ask if this
> proposal seems like a Good Idea™ :)

those above mentioned topics are on my TODO list for a long time already, so
any help is more then appreciated, thanks!

Since we're currently using buildbot repository as our main source for the
production containers, I would like to suggest use of issues[1] there to track
future plans and ongoing work transparently over there for obvious reasons.
Other option might be mirroring of that GitLab buildbot repo to GitHub and use
issues there instead if thats preferred.

More food for thoughts:

 * We should replace currently HW EOL machine serving buildbot.openwrt.org

   - we're currently blocked with this by still pending OpenWrt.org account on 
Hetzner
   - this refactoring might be a good opportunity for tackling it

 * Filter out GitPoller build events originating from noop sources like the CI 
tooling[2]

   - IIRC those build events gets propagated down to 2nd stage/package builds 
as well

 * Rate/resource limits handling during scripts/feeds invocations

   - git.openwrt.org might be overloaded in certain time periods, leading to
 waste of build resources and false positive build results

 * python3/host: build install race condition with
   uboot/scripts/dtc/pylibfdt[3] is another such resource waste example

 * Use HSM backed storage for release/package signing keys

 * IIRC Paul (and probably more folks) find our buildbot based system arcane
   and would like to try using something more recent, like for example GitHub
   Actions instead

   - perhaps we should try to align with those ideas and consider factoring
 out the build steps into something more self-contained, build layer
 agnostic and thus reusable?

   - it just seems to me, that we're reinventing a wheels[4]

 * We should consider making our buildbot infra completely open, so anyone can
   reuse it and/or make it better

1. https://gitlab.com/openwrt/buildbot/-/issues/new
2. https://github.com/openwrt/openwrt/pull/10094#issuecomment-1170760326
3. https://github.com/openwrt/openwrt/pull/10407
4. https://github.com/openwrt/packages/issues/19241

Cheers,

Petr

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [RFC] Refactoring OpenWrt's build infra

2022-10-05 Thread Hauke Mehrtens

On 10/5/22 17:56, Thibaut wrote:

Hi,

Following an earlier conversation on IRC with Petr, I’m willing to work on 
refactoring our buildbot setup as follows:

- single master for each stage (images and packages)
- latent workers attached to either master, thus able to build 
opportunistically from either master or release branches as needed / as work 
becomes available

The main upside is that all buildslaves could be pooled, improving overall 
throughput and reducing wasted « idle time », thus lowering build times and 
operating costs.

Petr also suggested that extra release workers could be spawned at will 
(through e.g. cloud VMs) when a new release is to be tagged; tagged release 
could be scheduled only to release workers: this would still work within this « 
single master » build scheme.

NB: I’m aware of the potential performance penalty of having buildslaves 
randomly switching between branches, so I would try to come up with a 
reasonably smart solution to this issue if it doesn’t conflict with the main 
goals.

Before I set on to revamp the system accordingly I want to ask if this proposal 
seems like a Good Idea™ :)

Comments welcome,
T.


Hi,

This sounds like a good idea, but I am not an expert in this topic.

I would approve such a change, but others are much more knowledge how 
our infrastructure works.


I do not know if we need special container for each release branch, I 
think we try to use an old Debian to build to make it possible to use 
the image builder binaries also on older systems.


Hauke

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


[RFC] Refactoring OpenWrt's build infra

2022-10-05 Thread Thibaut
Hi,

Following an earlier conversation on IRC with Petr, I’m willing to work on 
refactoring our buildbot setup as follows:

- single master for each stage (images and packages)
- latent workers attached to either master, thus able to build 
opportunistically from either master or release branches as needed / as work 
becomes available

The main upside is that all buildslaves could be pooled, improving overall 
throughput and reducing wasted « idle time », thus lowering build times and 
operating costs.

Petr also suggested that extra release workers could be spawned at will 
(through e.g. cloud VMs) when a new release is to be tagged; tagged release 
could be scheduled only to release workers: this would still work within this « 
single master » build scheme.

NB: I’m aware of the potential performance penalty of having buildslaves 
randomly switching between branches, so I would try to come up with a 
reasonably smart solution to this issue if it doesn’t conflict with the main 
goals.

Before I set on to revamp the system accordingly I want to ask if this proposal 
seems like a Good Idea™ :)

Comments welcome,
T.
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel