Hi Tom,

On Tue, 13 Apr 2021 at 06:18, Tom Rini <tr...@konsulko.com> wrote:
>
> On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote:
> > Hi Tom, Alex,
> >
> > On Tue, 13 Apr 2021 at 04:26, Tom Rini <tr...@konsulko.com> wrote:
> > >
> > > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote:
> > > >
> > > >
> > > > On 4/12/21 9:40 AM, Tom Rini wrote:
> > > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote:
> > > > > >
> > > > > >
> > > > > > On 4/12/21 8:25 AM, Tom Rini wrote:
> > > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote:
> > > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <tr...@konsulko.com> 
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote:
> > > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. 
> > > > > > > > > > <mr.nuke...@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Simon
> > > > > > > > > > >
> > > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote:
> > > > > > > > > > > > Hi Alexandru,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc 
> > > > > > > > > > > > <mr.nuke...@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > This reverts commit 
> > > > > > > > > > > > > 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75.
> > > > > > > > > > > > >
> > > > > > > > > > > > > struct global_data contains a pointer to the bd_info 
> > > > > > > > > > > > > structure. This
> > > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated 
> > > > > > > > > > > > > bd_info in the
> > > > > > > > > > > > > ".data" section. The referenced commit replaced this 
> > > > > > > > > > > > > mechanism to one
> > > > > > > > > > > > > that uses malloc(). That new mechanism is only used 
> > > > > > > > > > > > > if SPL_ALLOC_BD=y.
> > > > > > > > > > > > > which very few boards do.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The result is that (struct global_data)->bd is NULL 
> > > > > > > > > > > > > in SPL on most
> > > > > > > > > > > > > platforms. This breaks falcon mode, since 
> > > > > > > > > > > > > arch_fixup_fdt() tries to
> > > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" 
> > > > > > > > > > > > > node in the
> > > > > > > > > > > > > devicetree. The result is that the "/memory" node 
> > > > > > > > > > > > > contains garbage
> > > > > > > > > > > > > values, causing linux to panic() as it sets up the 
> > > > > > > > > > > > > page table.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Instead of trying to fix the mess, potentially 
> > > > > > > > > > > > > causing other issues,
> > > > > > > > > > > > > revert to the code that worked, while this change is 
> > > > > > > > > > > > > reworked.
> > > > > > > > > > > >
> > > > > > > > > > > > The goal here is to drop a feature that few boards use 
> > > > > > > > > > > > and reduce
> > > > > > > > > > > > memory usage in SPL. It has been in place for two 
> > > > > > > > > > > > releases now.
> > > > > > > > > > > >
> > > > > > > > > > > > If Falcon mode needs it, perhaps you could add an 
> > > > > > > > > > > > 'imply' in the
> > > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps
> > > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ?
> > > > > > > > > > > >
> > > > > > > > > > > > One option would be to return an error in 
> > > > > > > > > > > > arch_fixup_fdt(). In
> > > > > > > > > > > > general, fixups make things tricky because there is no 
> > > > > > > > > > > > way to
> > > > > > > > > > > > determine when they are used but at least this one has 
> > > > > > > > > > > > a CONFIG.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The argument that this has been in place for two releases 
> > > > > > > > > > > is incorrect.
> > > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only 
> > > > > > > > > > > introduced with
> > > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. 
> > > > > > > > > > > It's only in the
> > > > > > > > > > > last release, which is four days old t the time of this 
> > > > > > > > > > > writing.
> > > > > > > > >
> > > > > > > > > Yes.  But another way of saying that is that we're 4 days in 
> > > > > > > > > to the
> > > > > > > > > merge window.  That's a bit early to say we must revert the 
> > > > > > > > > change.  If
> > > > > > > > > this was just before the release, yes, revert would be the 
> > > > > > > > > right answer.
> > > > > > > > > It's also the case the original commit fixes some cases while 
> > > > > > > > > also
> > > > > > > > > saving size, if I read it right.  So a strict revert isn't 
> > > > > > > > > right, we'd
> > > > > > > > > need to also probably also default y SPL_ALLOC_BD in some 
> > > > > > > > > cases.
> > > > > > > > >
> > > > > > > > > > > Although I was able to find one example, the reality is 
> > > > > > > > > > > that we don't
> > > > > > > > > > > know the full extent of the breakage. The prudent thing 
> > > > > > > > > > > at this point is
> > > > > > > > > > > to revert.
> > > > > > > > > > >
> > > > > > > > > > > The knowledge of how to init the platform is in the 
> > > > > > > > > > > devicetree and code.
> > > > > > > > > > > Why should kconfig also be involved in storing this 
> > > > > > > > > > > knowledge? By this
> > > > > > > > > > > model, as the number of boards increases without bounds, 
> > > > > > > > > > > every "if"
> > > > > > > > > > > predicate tends to be Kconfig driven. That is not 
> > > > > > > > > > > maintainable, and why
> > > > > > > > > > > I think the original change --and the proposed fixes-- 
> > > > > > > > > > > are broken by design.
> > > > > > > > > > >
> > > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon 
> > > > > > > > > > > mode, and if
> > > > > > > > > > > we should kill it entirely (I have an alternative 
> > > > > > > > > > > proposal).  But we
> > > > > > > > > > > shouldn't have that discussion in a manner holding my 
> > > > > > > > > > > platform hostage.
> > > > > > > > > > > To be fair to you, I don't think reverting a 64-byte 
> > > > > > > > > > > savings in .data
> > > > > > > > > > > will hold your platform hostage either.
> > > > > > > > > >
> > > > > > > > > > That original patch broke a bunch of OMAP boards, but 
> > > > > > > > > > enabling
> > > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue.
> > > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it? 
> > > > > > > > > >  Maybe we
> > > > > > > > > > can make falcon mode imply it.
> > > > > > > > >
> > > > > > > > > It would be "select" since it needs it rather than imply.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee 
> > > > > > > > ("spl:
> > > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and
> > > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon 
> > > > > > > > mode is
> > > > > > > > not being used and the breakage is because 
> > > > > > > > arch/arm/mach-imx/spl.c
> > > > > > > > dram_init_banksize() accesses gd->bd which is NULL.
> > > > > > > >
> > > > > > > > So I would guess that this probably broke a whole lot of IMX 
> > > > > > > > based
> > > > > > > > boards that use SPL.
> > > > > > > >
> > > > > > > > I don't quite know what the best solution is... we now have a 
> > > > > > > > v2021.04
> > > > > > > > that is broken for several or many boards and I dont' know if 
> > > > > > > > its
> > > > > > > > clear what cases break.
> > > > > > >
> > > > > > > Looking forward, I think we need to rework this.  Simon, I gather 
> > > > > > > you
> > > > > > > have some platforms where we need to set gd->bd to something that 
> > > > > > > we
> > > > > > > malloc() ?  So perhaps spl_set_bd() should have been __weak so 
> > > > > > > that
> > > > > > > architectures / platforms could override as needed, but also move 
> > > > > > > it
> > > > > > > just past mem_malloc_init().
> > > > > >
> > > > > > Let's try and avoid making new weak functions. Why not introduce a
> > > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No 
> > > > > > diversion to
> > > > > > Kconfig and no __weak__ required.
> > > > >
> > > > > Well, who needs something different than what we had before exactly?
> > > >
> > > > I'm not sure. From reading the commit message of the broken change, it 
> > > > seems
> > > > the main driver was to save 64 bytes on coral. I'm assuming coral is the
> > > > codename for a platform.
> > >
> > > Yes, coral is an x86 chromebook.  It also says:
> > >     This uses up space in the SPL binary but it always starts as zero.
> > >     Also some boards cannot support data in TPL (e.g. Intel Apollo Lake).
> > >
> > > and defaults the new feature to enabled on all x86.  And there's
> > > a comment about x86 in the function (that with the changes made isn't so
> > > helpful/correct?) now too.  So yes, I want to see what Simon
> > >
> > > > > This shouldn't be kicked up to "every SoC must define...", but maybe
> > > > > "every architecture must define..." is OK.  Or maybe it's really just 
> > > > > "a
> > > > > few unusual platforms need" and so a weak function makes sense.
> > > >
> > > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe 
> > > > in
> > > > weak functins, as I see them as the problems that "made sense at the 
> > > > time".
> > >
> > > I see weak functions as a good solution when the problem is something
> > > where the majority of the (cross-platform) cases require the same
> > > solution, but there's one or two oddities out there.  But regardless, we
> > > need to have the problem better defined before we can propose a
> > > solution.
> >
> > So how about enabling the Kconfig for iMX? I am not sure how many
> > platforms do actually need this feature?
>
> That was an OK answer for the first group we hit this on.
>
> > Another option might be to (just as a test) take the 'bd' out of
> > global_data when the Kconfig is not defined, to find out which
> > platforms use it?
>
> And that might help us figure out the second grouping of platforms, yes.

OK I will take a look at that.

>
> > As to a weak function, what would the default behaviour be? If we can
> > define that, then it would be better IMO.
> >
> > When we try to refactor things, weak functions and undefined (or
> > arch-specific behaviour) can really make things tough.
>
> Well, what was the problem you were trying to solve here?  I assumed
> there was a case where things actively broke, with the previous design,
> and it's not just the 64byte savings in some cases.  But is it?

Yes the region of memory is not writable, so must be allocated at runtime.

Regards,
SImon

Reply via email to