(Tom - any thoughts about a more expansive cc list on this?) Hi Masahiro,
On 26 November 2017 at 07:16, Masahiro Yamada <yamada.masah...@socionext.com> wrote: > 2017-11-26 20:38 GMT+09:00 Simon Glass <s...@chromium.org>: >> Hi Philipp, >> >> On 25 November 2017 at 16:31, Dr. Philipp Tomsich >> <philipp.toms...@theobroma-systems.com> wrote: >>> Hi, >>> >>>> On 25 Nov 2017, at 23:34, Simon Glass <s...@chromium.org> wrote: >>>> >>>> +Tom, Masahiro, Philipp >>>> >>>> Hi, >>>> >>>> On 22 November 2017 at 03:27, Wolfgang Denk <w...@denx.de> wrote: >>>>> Dear Kever Yang, >>>>> >>>>> In message <fd0bb500-80c4-f317-cc18-f7aaf1344...@rock-chips.com> you >>>>> wrote: >>>>>> >>>>>> I can understand this feature, we always do dram_init_banks() first, >>>>>> then we relocate to 'known' area, then will be no risk to access memory. >>>>>> I believe there must be some historical reason for some kind of device, >>>>>> the relocate feature is a wonderful idea for it. >>>>> >>>>> This is actuallyu not so much a feature needed to support some >>>>> specific device (in this case much simpler approahces would be >>>>> possible), but to support a whole set of features. Unfortunately >>>>> these appear to get forgotten / ignored over time. >>>>> >>>>>> many other SoCs should be similar. >>>>>> - Without relocate we can save many step, some of our customer really >>>>>> care much about the boot time duration. >>>>>> * no need to relocate everything >>>>>> * no need to copy all the code >>>>>> * no need init the driver more than once >>>>> >>>>> Please have a look at the README, section "Memory Management". >>>>> The reloaction is not done to any _fixed_ address, but the address >>>>> is actually computed at runtime, depending on a number features >>>>> enabled (at least this is how it used to be - appearently little of >>>>> this is tested on a regular base, so I would not be surprised if >>>>> things are broken today). >>>>> >>>>> The basic idea was to reserve areas of memory at the top of RAM, >>>>> that would not be initialized / modified by U-Boot and Linux, not >>>>> even across a reset / warm boot. >>>>> >>>>> This was used for exaple for: >>>>> >>>>> - pRAM (Protected RAM) which could be used to store all kind of data >>>>> (for example, using a pramfs [Protected and Persistent RAM >>>>> Filesystem]) that could be kept across reboots of the OS. >>>>> >>>>> - shared frame buffer / video memory. U-Boot and Linux would be able >>>>> to initialize the video memory just once (in U-Boot) and then >>>>> share it, maybe even across reboots. especially, this would allow >>>>> for a very early splash screen that gets passed (flicker free) to >>>>> Linux until some Linux GUI takes over (much more difficult today). >>>>> >>>>> - shared log buffer: U-Boot and Linux used to use the same syslog >>>>> buffer mechanism, so you could share it between U-Boot and Linux. >>>>> this allows for example to >>>>> * read the Linux kernel panic messages after reset in U-Boot; this >>>>> is very useful when you bring up a new system and Linux crashes >>>>> before it can display the log buffer on the console >>>>> * pass U-Boot POST results on to Linux, so the application code >>>>> can read and process these >>>>> * process the system log of the previous run (especially after a >>>>> panic) in Lunux after it rebootet. >>>>> >>>>> etc. >>>>> >>>>> There are a number of such features which require to reserve room at >>>>> the top of RAM, the size of which is calculatedat runtime, often >>>>> depending on user settable environment data. >>>>> >>>>> All this cannot be done without relocation to a (dynmaically >>>>> computed) target address. >>>>> >>>>> >>>>> Yes, the code could be simpler and faster without that - but then, >>>>> you cut off a number of features. >>>> >>>> I would be interested in seeing benchmarks showing the cost of >>>> relocation in terms of boot time. Last time I did this was on Exynos 5 >>>> and it was some years ago. The time was pretty small provided the >>>> cache was on for the memory copies associated with relocation itself. >>>> Something like 10-20ms but I don't have the numbers handy. >>>> >>>> I think it is useful to be able to allocate memory in board_init_f() >>>> for use by U-Boot for things like the display and the malloc() region. >>>> >>>> Options we might consider: >>>> >>>> 1. Don't relocate the code and data. Thus we could avoid the copy and >>>> relocation cost. This is already supported with the GD_FLG_SKIP_RELOC >>>> used when U-Boot runs as an EFI app >>>> >>>> 2. Rather than throwing away the old malloc() region, keep it around >>>> so existing allocated blocks work. Then new malloc() region would be >>>> used for future allocations. We could perhaps ignore free() calls in >>>> that region >>>> >>>> 2a. This would allow us to avoid re-init of driver model in most cases >>>> I think. E.g. we could init serial and timer before relocation and >>>> leave them inited after relocation. We could just init the >>>> 'additional' devices not done before relocation. >>>> >>>> 2b. I suppose we could even extend this to SPL if we wanted to. I >>>> suspect it would just be a pain though, since SPL might use memory >>>> that U-Boot wants. >>>> >>>> 3. We could turn on the cache earlier. This removes most of the >>>> boot-time penalty. Ideally this should be turned on in SPL and perhaps >>>> redone in U-Boot which has more memory available. If SPL is not used, >>>> we could turn on the cache before relocation. >>> >>> Both turning on the cache and initialising the clocking could be of benefit >>> to boot-time. >>> >>> However, the biggest possible gain will come from utilising Falcon mode >>> to skip the full U-Boot stage and directly boot into the OS from SPL. This >>> assumes that the drivers involved are fully optimised, so loading up the >>> OS image does not take longer than necessary. >> >> I'd like to see numbers on that. From my experience, loading and >> running U-Boot does not take very long... >> >>> >>>> 4. Rather than the reserving memory in board_init_f() we could have it >>>> call malloc() from the expanded region. We could then perhaps then >>>> move this reserve/allocate code in to particular drivers or >>>> subsystems, and drop a good chunk of the init sequence. We would need >>>> to have a larger malloc() region than is currently the case. >>>> >>>> There are still some arch-specific bits in board_init_f() which make >>>> these sorts of changes a bit tricky to support generically. IMO it >>>> would be best to move to 'generic relocation' written in C, where all >>>> archs work basically the same way, before attempting any of the above. >>>> >>>> Still, I can see some benefits and even some simplifications. >>>> >>>> Regards, >>>> Simon >>> > > > > This discussion should have happened. > U-Boot boot sequence is crazily inefficient. > > > > When we talk about "relocation", two things are happening. > > [1] U-Boot proper copies itself to the very end of DRAM > [2] Fix-up the global symbols > > In my opinion, only [2] is useful. > > > SPL initializes the DRAM, so it knows the base and size of DRAM. > SPL should be able to load the U-Boot proper to the final destination. > So, [1] is unnecessary. > > > [2] is necessary because SPL may load the U-Boot proper > to a different place than CONFIG_SYS_TEXT_BASE. > This feature is useful for platforms > whose DRAM base/size is only known at run-time. > (Of course, it should be user-configurable by CONFIG_RELOCATE > or something.) > > Moreover, board_init_f() is unneeded - > everything in board_init_f() is already done by SPL. > Multiple-time DM initialization is really inefficient and ugly. > > > The following is how the ideal boot loader would work. > > > Requirement for U-Boot proper: > U-Boot never changes the location by itself. > So, SPL or a vendor loader must load U-Boot proper > to the final destination directly. > (You can load it to the very end of DRAM if you like, > but the actual place does not matter here.) > > > Boot sequence of U-Boot proper: > If CONFIG_RELOCATE (or something) is enabled, > it fixes the global symbols at the very beginning > of the boot. > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary) > > That's it. Proceed to the rest of init code. > (= board_init_r) > board_init_f() is unnecessary. > > This should work for recent platforms. Yes that sounds reasonable to me. We could do the symbol fixup/relocation in SPL after loading U-Boot., although that would probably push us to using ELF format for U-Boot which is a bit limited. Still I think the biggest performance improvement comes from turning on the cache in SPL. So the above is a simplification, not really a speed-up. > > > > We should think about old platforms that boot from a NOR flash or something. > There are two solutions: > - execute-in-place: run the code in the flash directly > - use SPL (common/spl/spl-nor.c) if you want to run > it from RAM This seems like a big regression in functionality. For example for x86 32-bit we currently don't have an SPL (we do for 64-bit). So I think this means that everything would be forced to have an SPL? I am wondering who else we should cc on this discussion? Regards, Simon _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot