Hi Simon,
2017-11-28 2:13 GMT+09:00 Simon Glass <s...@chromium.org>: > (Tom - any thoughts about a more expansive cc list on this?) > > Hi Masahiro, > > On 26 November 2017 at 07:16, Masahiro Yamada > <yamada.masah...@socionext.com> wrote: >> 2017-11-26 20:38 GMT+09:00 Simon Glass <s...@chromium.org>: >>> Hi Philipp, >>> >>> On 25 November 2017 at 16:31, Dr. Philipp Tomsich >>> <philipp.toms...@theobroma-systems.com> wrote: >>>> Hi, >>>> >>>>> On 25 Nov 2017, at 23:34, Simon Glass <s...@chromium.org> wrote: >>>>> >>>>> +Tom, Masahiro, Philipp >>>>> >>>>> Hi, >>>>> >>>>> On 22 November 2017 at 03:27, Wolfgang Denk <w...@denx.de> wrote: >>>>>> Dear Kever Yang, >>>>>> >>>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344...@rock-chips.com> you >>>>>> wrote: >>>>>>> >>>>>>> I can understand this feature, we always do dram_init_banks() first, >>>>>>> then we relocate to 'known' area, then will be no risk to access memory. >>>>>>> I believe there must be some historical reason for some kind of device, >>>>>>> the relocate feature is a wonderful idea for it. >>>>>> >>>>>> This is actuallyu not so much a feature needed to support some >>>>>> specific device (in this case much simpler approahces would be >>>>>> possible), but to support a whole set of features. Unfortunately >>>>>> these appear to get forgotten / ignored over time. >>>>>> >>>>>>> many other SoCs should be similar. >>>>>>> - Without relocate we can save many step, some of our customer really >>>>>>> care much about the boot time duration. >>>>>>> * no need to relocate everything >>>>>>> * no need to copy all the code >>>>>>> * no need init the driver more than once >>>>>> >>>>>> Please have a look at the README, section "Memory Management". >>>>>> The reloaction is not done to any _fixed_ address, but the address >>>>>> is actually computed at runtime, depending on a number features >>>>>> enabled (at least this is how it used to be - appearently little of >>>>>> this is tested on a regular base, so I would not be surprised if >>>>>> things are broken today). >>>>>> >>>>>> The basic idea was to reserve areas of memory at the top of RAM, >>>>>> that would not be initialized / modified by U-Boot and Linux, not >>>>>> even across a reset / warm boot. >>>>>> >>>>>> This was used for exaple for: >>>>>> >>>>>> - pRAM (Protected RAM) which could be used to store all kind of data >>>>>> (for example, using a pramfs [Protected and Persistent RAM >>>>>> Filesystem]) that could be kept across reboots of the OS. >>>>>> >>>>>> - shared frame buffer / video memory. U-Boot and Linux would be able >>>>>> to initialize the video memory just once (in U-Boot) and then >>>>>> share it, maybe even across reboots. especially, this would allow >>>>>> for a very early splash screen that gets passed (flicker free) to >>>>>> Linux until some Linux GUI takes over (much more difficult today). >>>>>> >>>>>> - shared log buffer: U-Boot and Linux used to use the same syslog >>>>>> buffer mechanism, so you could share it between U-Boot and Linux. >>>>>> this allows for example to >>>>>> * read the Linux kernel panic messages after reset in U-Boot; this >>>>>> is very useful when you bring up a new system and Linux crashes >>>>>> before it can display the log buffer on the console >>>>>> * pass U-Boot POST results on to Linux, so the application code >>>>>> can read and process these >>>>>> * process the system log of the previous run (especially after a >>>>>> panic) in Lunux after it rebootet. >>>>>> >>>>>> etc. >>>>>> >>>>>> There are a number of such features which require to reserve room at >>>>>> the top of RAM, the size of which is calculatedat runtime, often >>>>>> depending on user settable environment data. >>>>>> >>>>>> All this cannot be done without relocation to a (dynmaically >>>>>> computed) target address. >>>>>> >>>>>> >>>>>> Yes, the code could be simpler and faster without that - but then, >>>>>> you cut off a number of features. >>>>> >>>>> I would be interested in seeing benchmarks showing the cost of >>>>> relocation in terms of boot time. Last time I did this was on Exynos 5 >>>>> and it was some years ago. The time was pretty small provided the >>>>> cache was on for the memory copies associated with relocation itself. >>>>> Something like 10-20ms but I don't have the numbers handy. >>>>> >>>>> I think it is useful to be able to allocate memory in board_init_f() >>>>> for use by U-Boot for things like the display and the malloc() region. >>>>> >>>>> Options we might consider: >>>>> >>>>> 1. Don't relocate the code and data. Thus we could avoid the copy and >>>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC >>>>> used when U-Boot runs as an EFI app >>>>> >>>>> 2. Rather than throwing away the old malloc() region, keep it around >>>>> so existing allocated blocks work. Then new malloc() region would be >>>>> used for future allocations. We could perhaps ignore free() calls in >>>>> that region >>>>> >>>>> 2a. This would allow us to avoid re-init of driver model in most cases >>>>> I think. E.g. we could init serial and timer before relocation and >>>>> leave them inited after relocation. We could just init the >>>>> 'additional' devices not done before relocation. >>>>> >>>>> 2b. I suppose we could even extend this to SPL if we wanted to. I >>>>> suspect it would just be a pain though, since SPL might use memory >>>>> that U-Boot wants. >>>>> >>>>> 3. We could turn on the cache earlier. This removes most of the >>>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps >>>>> redone in U-Boot which has more memory available. If SPL is not used, >>>>> we could turn on the cache before relocation. >>>> >>>> Both turning on the cache and initialising the clocking could be of benefit >>>> to boot-time. >>>> >>>> However, the biggest possible gain will come from utilising Falcon mode >>>> to skip the full U-Boot stage and directly boot into the OS from SPL. This >>>> assumes that the drivers involved are fully optimised, so loading up the >>>> OS image does not take longer than necessary. >>> >>> I'd like to see numbers on that. From my experience, loading and >>> running U-Boot does not take very long... >>> >>>> >>>>> 4. Rather than the reserving memory in board_init_f() we could have it >>>>> call malloc() from the expanded region. We could then perhaps then >>>>> move this reserve/allocate code in to particular drivers or >>>>> subsystems, and drop a good chunk of the init sequence. We would need >>>>> to have a larger malloc() region than is currently the case. >>>>> >>>>> There are still some arch-specific bits in board_init_f() which make >>>>> these sorts of changes a bit tricky to support generically. IMO it >>>>> would be best to move to 'generic relocation' written in C, where all >>>>> archs work basically the same way, before attempting any of the above. >>>>> >>>>> Still, I can see some benefits and even some simplifications. >>>>> >>>>> Regards, >>>>> Simon >>>> >> >> >> >> This discussion should have happened. >> U-Boot boot sequence is crazily inefficient. >> >> >> >> When we talk about "relocation", two things are happening. >> >> [1] U-Boot proper copies itself to the very end of DRAM >> [2] Fix-up the global symbols >> >> In my opinion, only [2] is useful. >> >> >> SPL initializes the DRAM, so it knows the base and size of DRAM. >> SPL should be able to load the U-Boot proper to the final destination. >> So, [1] is unnecessary. >> >> >> [2] is necessary because SPL may load the U-Boot proper >> to a different place than CONFIG_SYS_TEXT_BASE. >> This feature is useful for platforms >> whose DRAM base/size is only known at run-time. >> (Of course, it should be user-configurable by CONFIG_RELOCATE >> or something.) >> >> Moreover, board_init_f() is unneeded - >> everything in board_init_f() is already done by SPL. >> Multiple-time DM initialization is really inefficient and ugly. >> >> >> The following is how the ideal boot loader would work. >> >> >> Requirement for U-Boot proper: >> U-Boot never changes the location by itself. >> So, SPL or a vendor loader must load U-Boot proper >> to the final destination directly. >> (You can load it to the very end of DRAM if you like, >> but the actual place does not matter here.) >> >> >> Boot sequence of U-Boot proper: >> If CONFIG_RELOCATE (or something) is enabled, >> it fixes the global symbols at the very beginning >> of the boot. >> (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary) >> >> That's it. Proceed to the rest of init code. >> (= board_init_r) >> board_init_f() is unnecessary. >> >> This should work for recent platforms. > > Yes that sounds reasonable to me. > > We could do the symbol fixup/relocation in SPL after loading U-Boot., > although that would probably push us to using ELF format for U-Boot > which is a bit limited. > > Still I think the biggest performance improvement comes from turning > on the cache in SPL. So the above is a simplification, not really a > speed-up. Right. I am more interested in simplification than in speed-up. The boot speed is not a significant problem at least for my boards. >> >> >> >> We should think about old platforms that boot from a NOR flash or something. >> There are two solutions: >> - execute-in-place: run the code in the flash directly >> - use SPL (common/spl/spl-nor.c) if you want to run >> it from RAM > > This seems like a big regression in functionality. For example for x86 > 32-bit we currently don't have an SPL (we do for 64-bit). So I think > this means that everything would be forced to have an SPL? After grace period for migration, Yes. XIP or SPL. No relocation in U-Boot proper. This assumption will allow us to dump a lot of burden. Remove relocation Remove board_init_f() Remove pre-reloc DM init Perhaps, remove struct global_data etc. > I am wondering who else we should cc on this discussion? > > Regards, > Simon > _______________________________________________ > U-Boot mailing list > U-Boot@lists.denx.de > https://lists.denx.de/listinfo/u-boot -- Best Regards Masahiro Yamada _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot