On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> Hi Tom, Alex,
> 
> On Tue, 13 Apr 2021 at 04:26, Tom Rini <tr...@konsulko.com> wrote:
> >
> > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > >
> > >
> > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > >
> > > > >
> > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <tr...@konsulko.com> 
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. <mr.nuke...@gmail.com> 
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Simon
> > > > > > > > > >
> > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc 
> > > > > > > > > > > <mr.nuke...@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > This reverts commit 
> > > > > > > > > > > > 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > >
> > > > > > > > > > > > struct global_data contains a pointer to the bd_info 
> > > > > > > > > > > > structure. This
> > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated 
> > > > > > > > > > > > bd_info in the
> > > > > > > > > > > > ".data" section. The referenced commit replaced this 
> > > > > > > > > > > > mechanism to one
> > > > > > > > > > > > that uses malloc(). That new mechanism is only used if 
> > > > > > > > > > > > SPL_ALLOC_BD=y.
> > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > >
> > > > > > > > > > > > The result is that (struct global_data)->bd is NULL in 
> > > > > > > > > > > > SPL on most
> > > > > > > > > > > > platforms. This breaks falcon mode, since 
> > > > > > > > > > > > arch_fixup_fdt() tries to
> > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" 
> > > > > > > > > > > > node in the
> > > > > > > > > > > > devicetree. The result is that the "/memory" node 
> > > > > > > > > > > > contains garbage
> > > > > > > > > > > > values, causing linux to panic() as it sets up the page 
> > > > > > > > > > > > table.
> > > > > > > > > > > >
> > > > > > > > > > > > Instead of trying to fix the mess, potentially causing 
> > > > > > > > > > > > other issues,
> > > > > > > > > > > > revert to the code that worked, while this change is 
> > > > > > > > > > > > reworked.
> > > > > > > > > > >
> > > > > > > > > > > The goal here is to drop a feature that few boards use 
> > > > > > > > > > > and reduce
> > > > > > > > > > > memory usage in SPL. It has been in place for two 
> > > > > > > > > > > releases now.
> > > > > > > > > > >
> > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 'imply' 
> > > > > > > > > > > in the
> > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > >
> > > > > > > > > > > One option would be to return an error in 
> > > > > > > > > > > arch_fixup_fdt(). In
> > > > > > > > > > > general, fixups make things tricky because there is no 
> > > > > > > > > > > way to
> > > > > > > > > > > determine when they are used but at least this one has a 
> > > > > > > > > > > CONFIG.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The argument that this has been in place for two releases 
> > > > > > > > > > is incorrect.
> > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only 
> > > > > > > > > > introduced with
> > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. 
> > > > > > > > > > It's only in the
> > > > > > > > > > last release, which is four days old t the time of this 
> > > > > > > > > > writing.
> > > > > > > >
> > > > > > > > Yes.  But another way of saying that is that we're 4 days in to 
> > > > > > > > the
> > > > > > > > merge window.  That's a bit early to say we must revert the 
> > > > > > > > change.  If
> > > > > > > > this was just before the release, yes, revert would be the 
> > > > > > > > right answer.
> > > > > > > > It's also the case the original commit fixes some cases while 
> > > > > > > > also
> > > > > > > > saving size, if I read it right.  So a strict revert isn't 
> > > > > > > > right, we'd
> > > > > > > > need to also probably also default y SPL_ALLOC_BD in some cases.
> > > > > > > >
> > > > > > > > > > Although I was able to find one example, the reality is 
> > > > > > > > > > that we don't
> > > > > > > > > > know the full extent of the breakage. The prudent thing at 
> > > > > > > > > > this point is
> > > > > > > > > > to revert.
> > > > > > > > > >
> > > > > > > > > > The knowledge of how to init the platform is in the 
> > > > > > > > > > devicetree and code.
> > > > > > > > > > Why should kconfig also be involved in storing this 
> > > > > > > > > > knowledge? By this
> > > > > > > > > > model, as the number of boards increases without bounds, 
> > > > > > > > > > every "if"
> > > > > > > > > > predicate tends to be Kconfig driven. That is not 
> > > > > > > > > > maintainable, and why
> > > > > > > > > > I think the original change --and the proposed fixes-- are 
> > > > > > > > > > broken by design.
> > > > > > > > > >
> > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon 
> > > > > > > > > > mode, and if
> > > > > > > > > > we should kill it entirely (I have an alternative 
> > > > > > > > > > proposal).  But we
> > > > > > > > > > shouldn't have that discussion in a manner holding my 
> > > > > > > > > > platform hostage.
> > > > > > > > > > To be fair to you, I don't think reverting a 64-byte 
> > > > > > > > > > savings in .data
> > > > > > > > > > will hold your platform hostage either.
> > > > > > > > >
> > > > > > > > > That original patch broke a bunch of OMAP boards, but enabling
> > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it?  
> > > > > > > > > Maybe we
> > > > > > > > > can make falcon mode imply it.
> > > > > > > >
> > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > >
> > > > > > >
> > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee 
> > > > > > > ("spl:
> > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon mode 
> > > > > > > is
> > > > > > > not being used and the breakage is because arch/arm/mach-imx/spl.c
> > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > >
> > > > > > > So I would guess that this probably broke a whole lot of IMX based
> > > > > > > boards that use SPL.
> > > > > > >
> > > > > > > I don't quite know what the best solution is... we now have a 
> > > > > > > v2021.04
> > > > > > > that is broken for several or many boards and I dont' know if its
> > > > > > > clear what cases break.
> > > > > >
> > > > > > Looking forward, I think we need to rework this.  Simon, I gather 
> > > > > > you
> > > > > > have some platforms where we need to set gd->bd to something that we
> > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so that
> > > > > > architectures / platforms could override as needed, but also move it
> > > > > > just past mem_malloc_init().
> > > > >
> > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No 
> > > > > diversion to
> > > > > Kconfig and no __weak__ required.
> > > >
> > > > Well, who needs something different than what we had before exactly?
> > >
> > > I'm not sure. From reading the commit message of the broken change, it 
> > > seems
> > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > codename for a platform.
> >
> > Yes, coral is an x86 chromebook.  It also says:
> >     This uses up space in the SPL binary but it always starts as zero.
> >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> >
> > and defaults the new feature to enabled on all x86.  And there's
> > a comment about x86 in the function (that with the changes made isn't so
> > helpful/correct?) now too.  So yes, I want to see what Simon
> >
> > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > "every architecture must define..." is OK.  Or maybe it's really just "a
> > > > few unusual platforms need" and so a weak function makes sense.
> > >
> > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe in
> > > weak functins, as I see them as the problems that "made sense at the 
> > > time".
> >
> > I see weak functions as a good solution when the problem is something
> > where the majority of the (cross-platform) cases require the same
> > solution, but there's one or two oddities out there.  But regardless, we
> > need to have the problem better defined before we can propose a
> > solution.
> 
> So how about enabling the Kconfig for iMX? I am not sure how many
> platforms do actually need this feature?

That was an OK answer for the first group we hit this on.

> Another option might be to (just as a test) take the 'bd' out of
> global_data when the Kconfig is not defined, to find out which
> platforms use it?

And that might help us figure out the second grouping of platforms, yes.

> As to a weak function, what would the default behaviour be? If we can
> define that, then it would be better IMO.
> 
> When we try to refactor things, weak functions and undefined (or
> arch-specific behaviour) can really make things tough.

Well, what was the problem you were trying to solve here?  I assumed
there was a case where things actively broke, with the previous design,
and it's not just the 64byte savings in some cases.  But is it?

-- 
Tom

Attachment: signature.asc
Description: PGP signature

Reply via email to