On 20.10.21 05:42, Simon Glass wrote:
Hi,

On Tue, 19 Oct 2021 at 17:01, Tom Rini <tr...@konsulko.com> wrote:

On Tue, Oct 19, 2021 at 04:59:15PM -0600, Simon Glass wrote:
Hi Tom,

On Tue, 19 Oct 2021 at 16:53, Tom Rini <tr...@konsulko.com> wrote:

On Tue, Oct 19, 2021 at 05:39:12PM +0200, Stefano Babic wrote:
Hi Simon,

On 07.10.21 15:43, Simon Glass wrote:
Hi Stefano,

On Thu, 7 Oct 2021 at 04:37, Stefano Babic <sba...@denx.de> wrote:

Hi all,

CI stops by building aarch64 without notice, for reference:

https://source.denx.de/u-boot/custodians/u-boot-imx/-/jobs/332319

There is no error, just process is killed. It looks like it stops at
xilinx_zynqmp_virt,

./tools/buildman/buildman -o /tmp -P -E -W aarch64but board can be built
without issues.

If I build on my host (not in docker, anyway), it generally builds fine
- but it crashes sometimes, too. On gitlab instance , it crashes.
Issue does not seem that depends on merged patches, and introduces
boards were already built successfully. Any hint ? I have also no idea
what I should look as what I see is just

"usr/bin/bash: line 104:    24 Killed
./tools/buildman/buildman -o /tmp -P -E -W aarch64"

I cannot see that link. I am not sure what is going on. Does it say
what signal killed it?

Pipelines on our server were not public - I have enbaled now for u-boot-imx.


Does it sit there for an hour and timeout? If so, then I  did see that
myself once recently, when the Kconfig needed stdin, but I could not
quitetie it down. I think buildman would provide it, but sometimes
not, apparently. So it can happen when there is an existing build
there and your new one which adds Kconfig options that don't have
defaults, or something like that?


I have investigated further, and I can reproduce it on my host outside the
gitlab server. buildman causes a OOM, but I cannot find the cause.

Strange enough, this happens with the "aarch64" target, and I cannot
reproduce it with Tom's master. So it seems that -master is ok, and somethin
on u-boot-imx generates the OOM.

However....

The OOM happens always when -2 (two boards remain) appears. I can see with
htop that buildman starts to allocate memory until it is exhausted (64GB RAM
+ 8 GB swap). Then the kernel decides that it is enough and kills buildman -
this is what I see on Ci.

You can see now the pipelines:

https://source.denx.de/u-boot/custodians/u-boot-imx/-/pipelines/9520

I have then split aarch64 and I built imx8 separately - same result. The
pipeline stops with xilinx board, but they have nothing to do. In fact, I
can build all xilinx board separately. If I run buildman -W aarch64 -x
xilinx, OOM is shown by another board.

Strange enough, I can build each single board with buildman without issues,
neither errors nor warnongs. Just when buildman runs all together (aarch64,
308 boards), the OOM is generated.

Bisect does not help: I started bisect, and at the end this commit was
presented:

commit 53a24dee86fb72ae41e7579607bafe13442616f2
Author: Fabio Estevam <feste...@denx.de>
Date:   Mon Aug 23 21:11:09 2021 -0300

     imx8mm-cl-iot-gate: Split the defconfigs

I strongly suspect what's going on here is that these new defconfigs are
out of sync with changes now in Kconfig.  The build itself will just sit
there, waiting for the "oldconfig" prompt to be answered.

I want to say the problem here is that stdin is open, rather than
pointing to something closed and would lead to the build failing
immediately, rather than once a timeout is hit, or OOM kicks in due to
kconfig chewing up all the memory.

Yes that's exactly what I saw...

In fact, see this commit:

e62a24ce27a buildman: Avoid hanging when the config changes

But that was 3 years ago.

Looks like something else needs to be changed then, I've bisected down
similar failures here before very recently.

I dug into this a bit and I think buildman can detect this situation.
I'll send a little series.


Patch definetly help ;-)

It breaks build (on CI when build-tools runs), but I get much more details when I build locally single boards. I can find for kontron-sl-mx8mm several errors due to:

- CONFIG_SYS_LOAD_ADDR not defined in configs, but in header
- CONFIG_SYS_EXTRA_OPTIONS instead of CONFIG_IMX_CONFIG
- CONFIG_SYS_MALLOC_LEN not defined in config, but in header

Your patch are a valueable tool (CI driove me crazy), I can now folow what happens. I send a patch for kontron, and I go on with the rest (I guess kontron is not the only board causing this deadlock). Many thanks !

Tom, I apply Simon's patches on my tree, I cannot work without them...

Regards,
Stefano

--
=====================================================================
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: +49-8142-66989-53 Fax: +49-8142-66989-80 Email: sba...@denx.de
=====================================================================

Reply via email to