Hi Stefano, On Tue, 19 Oct 2021 at 09:39, Stefano Babic <sba...@denx.de> wrote: > > Hi Simon, > > On 07.10.21 15:43, Simon Glass wrote: > > Hi Stefano, > > > > On Thu, 7 Oct 2021 at 04:37, Stefano Babic <sba...@denx.de> wrote: > >> > >> Hi all, > >> > >> CI stops by building aarch64 without notice, for reference: > >> > >> https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319 > >> > >> There is no error, just process is killed. It looks like it stops at > >> xilinx_zynqmp_virt, > >> > >> ./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can be built > >> without issues. > >> > >> If I build on my host (not in docker, anyway), it generally builds fine > >> - but it crashes sometimes, too. On gitlab instance , it crashes. > >> Issue does not seem that depends on merged patches, and introduces > >> boards were already built successfully. Any hint ? I have also no idea > >> what I should look as what I see is just > >> > >> "usr/bin/bash: line 104: 24 Killed > >> ./tools/buildman/buildman -o /tmp -P -E -W aarch64" > > > > I cannot see that link. I am not sure what is going on. Does it say > > what signal killed it? > > Pipelines on our server were not public - I have enbaled now for u-boot-imx. > > > > > Does it sit there for an hour and timeout? If so, then I did see that > > myself once recently, when the Kconfig needed stdin, but I could not > > quitetie it down. I think buildman would provide it, but sometimes > > not, apparently. So it can happen when there is an existing build > > there and your new one which adds Kconfig options that don't have > > defaults, or something like that? > > > > I have investigated further, and I can reproduce it on my host outside > the gitlab server. buildman causes a OOM, but I cannot find the cause. > > Strange enough, this happens with the "aarch64" target, and I cannot > reproduce it with Tom's master. So it seems that -master is ok, and > somethin on u-boot-imx generates the OOM. > > However.... > > The OOM happens always when -2 (two boards remain) appears. I can see > with htop that buildman starts to allocate memory until it is exhausted > (64GB RAM + 8 GB swap). Then the kernel decides that it is enough and > kills buildman - this is what I see on Ci. > > You can see now the pipelines: > > https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520 > > I have then split aarch64 and I built imx8 separately - same result. The > pipeline stops with xilinx board, but they have nothing to do. In fact, > I can build all xilinx board separately. If I run buildman -W aarch64 -x > xilinx, OOM is shown by another board. > > Strange enough, I can build each single board with buildman without > issues, neither errors nor warnongs. Just when buildman runs all > together (aarch64, 308 boards), the OOM is generated. > > Bisect does not help: I started bisect, and at the end this commit was > presented: > > commit 53a24dee86fb72ae41e7579607bafe13442616f2 > Author: Fabio Estevam <feste...@denx.de> > Date: Mon Aug 23 21:11:09 2021 -0300 > > imx8mm-cl-iot-gate: Split the defconfigs > > > But it is a fake: I can revert it, I get the issue again. And the patch > has nothing to do. > > It looks to me it is something in binman, maybe triggered by some > changes in tree, but all boards can be built separately without issues. > I supposed to find the cause in code due to applied patches, but because > each board can be built and no help from bisect, I am quite puzzled. I > avoid to send a PR to Tom, else I guess the problem goes into -master, > but I do not know how to proceed, and I have a lot of patches to be applied. > > What can be done ? > > > If that is it, you can repeat it by clearing out your .bm-work > > On gitlab, the build starts from scratch.
Can you check that there is definitely nothing around from the previous build? > > > directory then building just that board for one commit, then the next > > (with the Kconfig change). > > I have run buildman for each single board, all of them were successuful. > With aarch64, I get OOM from buildman. > > > > > Buildman is supposed to handle this, of course. I'm not sure what has > > changed. > > I still believe this is due to the reason I said, but I'm happy to be proved wrong. Regards, Simon