Hi Tom, On Tue, 13 Apr 2021 at 06:18, Tom Rini <tr...@konsulko.com> wrote: > > On Tue, Apr 13, 2021 at 05:49:19AM +1200, Simon Glass wrote: > > Hi Tom, Alex, > > > > On Tue, 13 Apr 2021 at 04:26, Tom Rini <tr...@konsulko.com> wrote: > > > > > > On Mon, Apr 12, 2021 at 10:21:18AM -0500, Alex G. wrote: > > > > > > > > > > > > On 4/12/21 9:40 AM, Tom Rini wrote: > > > > > On Mon, Apr 12, 2021 at 08:51:11AM -0500, Alex G. wrote: > > > > > > > > > > > > > > > > > > On 4/12/21 8:25 AM, Tom Rini wrote: > > > > > > > On Fri, Apr 09, 2021 at 05:29:36PM -0700, Tim Harvey wrote: > > > > > > > > On Fri, Apr 9, 2021 at 1:53 PM Tom Rini <tr...@konsulko.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Fri, Apr 09, 2021 at 03:24:41PM -0500, Adam Ford wrote: > > > > > > > > > > On Fri, Apr 9, 2021 at 2:20 PM Alex G. > > > > > > > > > > <mr.nuke...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Simon > > > > > > > > > > > > > > > > > > > > > > On 4/8/21 6:55 PM, Simon Glass wrote: > > > > > > > > > > > > Hi Alexandru, > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 9 Apr 2021 at 04:56, Alexandru Gagniuc > > > > > > > > > > > > <mr.nuke...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > This reverts commit > > > > > > > > > > > > > 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75. > > > > > > > > > > > > > > > > > > > > > > > > > > struct global_data contains a pointer to the bd_info > > > > > > > > > > > > > structure. This > > > > > > > > > > > > > pointer was populated spl_set_bd() to a pre-allocated > > > > > > > > > > > > > bd_info in the > > > > > > > > > > > > > ".data" section. The referenced commit replaced this > > > > > > > > > > > > > mechanism to one > > > > > > > > > > > > > that uses malloc(). That new mechanism is only used > > > > > > > > > > > > > if SPL_ALLOC_BD=y. > > > > > > > > > > > > > which very few boards do. > > > > > > > > > > > > > > > > > > > > > > > > > > The result is that (struct global_data)->bd is NULL > > > > > > > > > > > > > in SPL on most > > > > > > > > > > > > > platforms. This breaks falcon mode, since > > > > > > > > > > > > > arch_fixup_fdt() tries to > > > > > > > > > > > > > access (struct global_data)->bd and set the "/memory" > > > > > > > > > > > > > node in the > > > > > > > > > > > > > devicetree. The result is that the "/memory" node > > > > > > > > > > > > > contains garbage > > > > > > > > > > > > > values, causing linux to panic() as it sets up the > > > > > > > > > > > > > page table. > > > > > > > > > > > > > > > > > > > > > > > > > > Instead of trying to fix the mess, potentially > > > > > > > > > > > > > causing other issues, > > > > > > > > > > > > > revert to the code that worked, while this change is > > > > > > > > > > > > > reworked. > > > > > > > > > > > > > > > > > > > > > > > > The goal here is to drop a feature that few boards use > > > > > > > > > > > > and reduce > > > > > > > > > > > > memory usage in SPL. It has been in place for two > > > > > > > > > > > > releases now. > > > > > > > > > > > > > > > > > > > > > > > > If Falcon mode needs it, perhaps you could add an > > > > > > > > > > > > 'imply' in the > > > > > > > > > > > > Kconfig for that feature? Is there one? Or perhaps > > > > > > > > > > > > CONFIG_ARCH_FIXUP_FDT_MEMORY ? > > > > > > > > > > > > > > > > > > > > > > > > One option would be to return an error in > > > > > > > > > > > > arch_fixup_fdt(). In > > > > > > > > > > > > general, fixups make things tricky because there is no > > > > > > > > > > > > way to > > > > > > > > > > > > determine when they are used but at least this one has > > > > > > > > > > > > a CONFIG. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The argument that this has been in place for two releases > > > > > > > > > > > is incorrect. > > > > > > > > > > > Commit 38d6b7ebdaee3e0e8426ef1b9df88bdce8ae2e75 was only > > > > > > > > > > > introduced with > > > > > > > > > > > the v2021.04 release. It definitely was not in 2021.01. > > > > > > > > > > > It's only in the > > > > > > > > > > > last release, which is four days old t the time of this > > > > > > > > > > > writing. > > > > > > > > > > > > > > > > > > Yes. But another way of saying that is that we're 4 days in > > > > > > > > > to the > > > > > > > > > merge window. That's a bit early to say we must revert the > > > > > > > > > change. If > > > > > > > > > this was just before the release, yes, revert would be the > > > > > > > > > right answer. > > > > > > > > > It's also the case the original commit fixes some cases while > > > > > > > > > also > > > > > > > > > saving size, if I read it right. So a strict revert isn't > > > > > > > > > right, we'd > > > > > > > > > need to also probably also default y SPL_ALLOC_BD in some > > > > > > > > > cases. > > > > > > > > > > > > > > > > > > > > Although I was able to find one example, the reality is > > > > > > > > > > > that we don't > > > > > > > > > > > know the full extent of the breakage. The prudent thing > > > > > > > > > > > at this point is > > > > > > > > > > > to revert. > > > > > > > > > > > > > > > > > > > > > > The knowledge of how to init the platform is in the > > > > > > > > > > > devicetree and code. > > > > > > > > > > > Why should kconfig also be involved in storing this > > > > > > > > > > > knowledge? By this > > > > > > > > > > > model, as the number of boards increases without bounds, > > > > > > > > > > > every "if" > > > > > > > > > > > predicate tends to be Kconfig driven. That is not > > > > > > > > > > > maintainable, and why > > > > > > > > > > > I think the original change --and the proposed fixes-- > > > > > > > > > > > are broken by design. > > > > > > > > > > > > > > > > > > > > > > Furthermore, I'm happy to discuss what to do about Falcon > > > > > > > > > > > mode, and if > > > > > > > > > > > we should kill it entirely (I have an alternative > > > > > > > > > > > proposal). But we > > > > > > > > > > > shouldn't have that discussion in a manner holding my > > > > > > > > > > > platform hostage. > > > > > > > > > > > To be fair to you, I don't think reverting a 64-byte > > > > > > > > > > > savings in .data > > > > > > > > > > > will hold your platform hostage either. > > > > > > > > > > > > > > > > > > > > That original patch broke a bunch of OMAP boards, but > > > > > > > > > > enabling > > > > > > > > > > SPL_ALLOC_BD was the solution to fix the issue. > > > > > > > > > > Can you try enabling SPL_ALLOC_BD and see if that fixes it? > > > > > > > > > > Maybe we > > > > > > > > > > can make falcon mode imply it. > > > > > > > > > > > > > > > > > > It would be "select" since it needs it rather than imply. > > > > > > > > > > > > > > > > > > > > > > > > > I just ran into this as well finding that commit 38d6b7ebdaee > > > > > > > > ("spl: > > > > > > > > Drop bd_info in the data section") breaks Gateworks Ventana and > > > > > > > > defining SPL_ALLOC_BD does resolve it. In this case, Falcon > > > > > > > > mode is > > > > > > > > not being used and the breakage is because > > > > > > > > arch/arm/mach-imx/spl.c > > > > > > > > dram_init_banksize() accesses gd->bd which is NULL. > > > > > > > > > > > > > > > > So I would guess that this probably broke a whole lot of IMX > > > > > > > > based > > > > > > > > boards that use SPL. > > > > > > > > > > > > > > > > I don't quite know what the best solution is... we now have a > > > > > > > > v2021.04 > > > > > > > > that is broken for several or many boards and I dont' know if > > > > > > > > its > > > > > > > > clear what cases break. > > > > > > > > > > > > > > Looking forward, I think we need to rework this. Simon, I gather > > > > > > > you > > > > > > > have some platforms where we need to set gd->bd to something that > > > > > > > we > > > > > > > malloc() ? So perhaps spl_set_bd() should have been __weak so > > > > > > > that > > > > > > > architectures / platforms could override as needed, but also move > > > > > > > it > > > > > > > just past mem_malloc_init(). > > > > > > > > > > > > Let's try and avoid making new weak functions. Why not introduce a > > > > > > spl_platform_alloc_bd(), that each plat- *must* implement? No > > > > > > diversion to > > > > > > Kconfig and no __weak__ required. > > > > > > > > > > Well, who needs something different than what we had before exactly? > > > > > > > > I'm not sure. From reading the commit message of the broken change, it > > > > seems > > > > the main driver was to save 64 bytes on coral. I'm assuming coral is the > > > > codename for a platform. > > > > > > Yes, coral is an x86 chromebook. It also says: > > > This uses up space in the SPL binary but it always starts as zero. > > > Also some boards cannot support data in TPL (e.g. Intel Apollo Lake). > > > > > > and defaults the new feature to enabled on all x86. And there's > > > a comment about x86 in the function (that with the changes made isn't so > > > helpful/correct?) now too. So yes, I want to see what Simon > > > > > > > > This shouldn't be kicked up to "every SoC must define...", but maybe > > > > > "every architecture must define..." is OK. Or maybe it's really just > > > > > "a > > > > > few unusual platforms need" and so a weak function makes sense. > > > > > > > > We can can have arch_spl_alloc_bd() instead of plat_(). I don't believe > > > > in > > > > weak functins, as I see them as the problems that "made sense at the > > > > time". > > > > > > I see weak functions as a good solution when the problem is something > > > where the majority of the (cross-platform) cases require the same > > > solution, but there's one or two oddities out there. But regardless, we > > > need to have the problem better defined before we can propose a > > > solution. > > > > So how about enabling the Kconfig for iMX? I am not sure how many > > platforms do actually need this feature? > > That was an OK answer for the first group we hit this on. > > > Another option might be to (just as a test) take the 'bd' out of > > global_data when the Kconfig is not defined, to find out which > > platforms use it? > > And that might help us figure out the second grouping of platforms, yes.
OK I will take a look at that. > > > As to a weak function, what would the default behaviour be? If we can > > define that, then it would be better IMO. > > > > When we try to refactor things, weak functions and undefined (or > > arch-specific behaviour) can really make things tough. > > Well, what was the problem you were trying to solve here? I assumed > there was a case where things actively broke, with the previous design, > and it's not just the 64byte savings in some cases. But is it? Yes the region of memory is not writable, so must be allocated at runtime. Regards, SImon