On 4/12/21 8:25 AM, Tom Rini wrote:
On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <tr...@konsulko.com> wrote:

On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke...@gmail.com> wrote:

Hi Simon

On 4/8/21 6:55 PM, Simon Glass wrote:
Hi Alexandru,

On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc <mr.nuke...@gmail.com> wrote:

This reverts commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.

struct global_data contains a pointer to the bd_info structure. This
pointer was populated spl_set_bd() to a pre-allocated bd_info in the
".data" section. The referenced commit replaced this mechanism to one
that uses malloc(). That new mechanism is only used if SPL_ALLOC_BD=y.
which very few boards do.

The result is that (struct global_data)->bd is NULL in SPL on most
platforms. This breaks falcon mode, since arch_fixup_fdt() tries to
access (struct global_data)->bd and set the "/memory" node in the
devicetree. The result is that the "/memory" node contains garbage
values, causing linux to panic() as it sets up the page table.

Instead of trying to fix the mess, potentially causing other issues,
revert to the code that worked, while this change is reworked.

The goal here is to drop a feature that few boards use and reduce
memory usage in SPL. It has been in place for two releases now.

If Falcon mode needs it, perhaps you could add an 'imply' in the
Kconfig for that feature? Is there one? Or perhaps
CONFIG_ARCH_FIXUP_FDT_MEMORY ?

One option would be to return an error in arch_fixup_fdt(). In
general, fixups make things tricky because there is no way to
determine when they are used but at least this one has a CONFIG.


The argument that this has been in place for two releases is incorrect.
Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only introduced with
the v2021.04 release. It definitely was not in 2021.01. It's only in the
last release, which is four days old t the time of this writing.

Yes.  But another way of saying that is that we're 4 days in to the
merge window.  That's a bit early to say we must revert the change.  If
this was just before the release, yes, revert would be the right answer.
It's also the case the original commit fixes some cases while also
saving size, if I read it right.  So a strict revert isn't right, we'd
need to also probably also default y SPL_ALLOC_BD in some cases.

Although I was able to find one example, the reality is that we don't
know the full extent of the breakage. The prudent thing at this point is
to revert.

The knowledge of how to init the platform is in the devicetree and code.
Why should kconfig also be involved in storing this knowledge? By this
model, as the number of boards increases without bounds, every "if"
predicate tends to be Kconfig driven. That is not maintainable, and why
I think the original change --and the proposed fixes-- are broken by design.

Furthermore, I'm happy to discuss what to do about Falcon mode, and if
we should kill it entirely (I have an alternative proposal).  But we
shouldn't have that discussion in a manner holding my platform hostage.
To be fair to you, I don't think reverting a 64-byte savings in .data
will hold your platform hostage either.

That original patch broke a bunch of OMAP boards, but enabling
SPL_ALLOC_BD was the solution to fix the issue.
Can you try enabling SPL_ALLOC_BD and see if that fixes it?  Maybe we
can make falcon mode imply it.

It would be "select" since it needs it rather than imply.


I just ran into this as well finding that commit 38d6b7ebdaee ("spl:
Drop bd_info in the data section") breaks Gateworks Ventana and
defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode is
not being used and the breakage is because arch/arm/mach-imx/spl.c
dram_init_banksize() accesses gd->bd which is NULL.

So I would guess that this probably broke a whole lot of IMX based
boards that use SPL.

I don't quite know what the best solution is... we now have a v2021.04
that is broken for several or many boards and I dont' know if its
clear what cases break.

Looking forward, I think we need to rework this.  Simon, I gather you
have some platforms where we need to set gd->bd to something that we
malloc() ?  So perhaps spl_set_bd() should have been __weak so that
architectures / platforms could override as needed, but also move it
just past mem_malloc_init().

Let's try and avoid making new weak functions. Why not introduce a spl_platform_alloc_bd(), that each plat- *must* implement? No diversion to Kconfig and no __weak__ required.

Alex

Reply via email to