Hi Joakim, On 29 November 2017 at 03:48, Joakim Tjernlund <joakim.tjernl...@infinera.com> wrote: > On Wed, 2017-11-29 at 19:11 +0900, Masahiro Yamada wrote: >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you recognize the sender and know the >> content is safe. >> >> >> Hi Simon, >> >> >> 2017-11-28 2:13 GMT+09:00 Simon Glass <s...@chromium.org>: >> > (Tom - any thoughts about a more expansive cc list on this?) >> > >> > Hi Masahiro, >> > >> > On 26 November 2017 at 07:16, Masahiro Yamada >> > <yamada.masah...@socionext.com> wrote: >> > > 2017-11-26 20:38 GMT+09:00 Simon Glass <s...@chromium.org>: >> > > > Hi Philipp, >> > > > >> > > > On 25 November 2017 at 16:31, Dr. Philipp Tomsich >> > > > <philipp.toms...@theobroma-systems.com> wrote: >> > > > > Hi, >> > > > > >> > > > > > On 25 Nov 2017, at 23:34, Simon Glass <s...@chromium.org> wrote: >> > > > > > >> > > > > > +Tom, Masahiro, Philipp >> > > > > > >> > > > > > Hi, >> > > > > > >> > > > > > On 22 November 2017 at 03:27, Wolfgang Denk <w...@denx.de> wrote: >> > > > > > > Dear Kever Yang, >> > > > > > > >> > > > > > > In message <fd0bb500-80c4-f317-cc18-f7aaf1344...@rock-chips.com> >> > > > > > > you wrote: >> > > > > > > > >> > > > > > > > I can understand this feature, we always do dram_init_banks() >> > > > > > > > first, >> > > > > > > > then we relocate to 'known' area, then will be no risk to >> > > > > > > > access memory. >> > > > > > > > I believe there must be some historical reason for some kind >> > > > > > > > of device, >> > > > > > > > the relocate feature is a wonderful idea for it. >> > > > > > > >> > > > > > > This is actuallyu not so much a feature needed to support some >> > > > > > > specific device (in this case much simpler approahces would be >> > > > > > > possible), but to support a whole set of features. Unfortunately >> > > > > > > these appear to get forgotten / ignored over time. >> > > > > > > >> > > > > > > > many other SoCs should be similar. >> > > > > > > > - Without relocate we can save many step, some of our customer >> > > > > > > > really >> > > > > > > > care much about the boot time duration. >> > > > > > > > * no need to relocate everything >> > > > > > > > * no need to copy all the code >> > > > > > > > * no need init the driver more than once >> > > > > > > >> > > > > > > Please have a look at the README, section "Memory Management". >> > > > > > > The reloaction is not done to any _fixed_ address, but the >> > > > > > > address >> > > > > > > is actually computed at runtime, depending on a number features >> > > > > > > enabled (at least this is how it used to be - appearently little >> > > > > > > of >> > > > > > > this is tested on a regular base, so I would not be surprised if >> > > > > > > things are broken today). >> > > > > > > >> > > > > > > The basic idea was to reserve areas of memory at the top of RAM, >> > > > > > > that would not be initialized / modified by U-Boot and Linux, not >> > > > > > > even across a reset / warm boot. >> > > > > > > >> > > > > > > This was used for exaple for: >> > > > > > > >> > > > > > > - pRAM (Protected RAM) which could be used to store all kind of >> > > > > > > data >> > > > > > > (for example, using a pramfs [Protected and Persistent RAM >> > > > > > > Filesystem]) that could be kept across reboots of the OS. >> > > > > > > >> > > > > > > - shared frame buffer / video memory. U-Boot and Linux would be >> > > > > > > able >> > > > > > > to initialize the video memory just once (in U-Boot) and then >> > > > > > > share it, maybe even across reboots. especially, this would >> > > > > > > allow >> > > > > > > for a very early splash screen that gets passed (flicker free) >> > > > > > > to >> > > > > > > Linux until some Linux GUI takes over (much more difficult >> > > > > > > today). >> > > > > > > >> > > > > > > - shared log buffer: U-Boot and Linux used to use the same syslog >> > > > > > > buffer mechanism, so you could share it between U-Boot and >> > > > > > > Linux. >> > > > > > > this allows for example to >> > > > > > > * read the Linux kernel panic messages after reset in U-Boot; >> > > > > > > this >> > > > > > > is very useful when you bring up a new system and Linux >> > > > > > > crashes >> > > > > > > before it can display the log buffer on the console >> > > > > > > * pass U-Boot POST results on to Linux, so the application code >> > > > > > > can read and process these >> > > > > > > * process the system log of the previous run (especially after a >> > > > > > > panic) in Lunux after it rebootet. >> > > > > > > >> > > > > > > etc. >> > > > > > > >> > > > > > > There are a number of such features which require to reserve >> > > > > > > room at >> > > > > > > the top of RAM, the size of which is calculatedat runtime, often >> > > > > > > depending on user settable environment data. >> > > > > > > >> > > > > > > All this cannot be done without relocation to a (dynmaically >> > > > > > > computed) target address. >> > > > > > > >> > > > > > > >> > > > > > > Yes, the code could be simpler and faster without that - but >> > > > > > > then, >> > > > > > > you cut off a number of features. >> > > > > > >> > > > > > I would be interested in seeing benchmarks showing the cost of >> > > > > > relocation in terms of boot time. Last time I did this was on >> > > > > > Exynos 5 >> > > > > > and it was some years ago. The time was pretty small provided the >> > > > > > cache was on for the memory copies associated with relocation >> > > > > > itself. >> > > > > > Something like 10-20ms but I don't have the numbers handy. >> > > > > > >> > > > > > I think it is useful to be able to allocate memory in >> > > > > > board_init_f() >> > > > > > for use by U-Boot for things like the display and the malloc() >> > > > > > region. >> > > > > > >> > > > > > Options we might consider: >> > > > > > >> > > > > > 1. Don't relocate the code and data. Thus we could avoid the copy >> > > > > > and >> > > > > > relocation cost. This is already supported with the >> > > > > > GD_FLG_SKIP_RELOC >> > > > > > used when U-Boot runs as an EFI app >> > > > > > >> > > > > > 2. Rather than throwing away the old malloc() region, keep it >> > > > > > around >> > > > > > so existing allocated blocks work. Then new malloc() region would >> > > > > > be >> > > > > > used for future allocations. We could perhaps ignore free() calls >> > > > > > in >> > > > > > that region >> > > > > > >> > > > > > 2a. This would allow us to avoid re-init of driver model in most >> > > > > > cases >> > > > > > I think. E.g. we could init serial and timer before relocation and >> > > > > > leave them inited after relocation. We could just init the >> > > > > > 'additional' devices not done before relocation. >> > > > > > >> > > > > > 2b. I suppose we could even extend this to SPL if we wanted to. I >> > > > > > suspect it would just be a pain though, since SPL might use memory >> > > > > > that U-Boot wants. >> > > > > > >> > > > > > 3. We could turn on the cache earlier. This removes most of the >> > > > > > boot-time penalty. Ideally this should be turned on in SPL and >> > > > > > perhaps >> > > > > > redone in U-Boot which has more memory available. If SPL is not >> > > > > > used, >> > > > > > we could turn on the cache before relocation. >> > > > > >> > > > > Both turning on the cache and initialising the clocking could be of >> > > > > benefit >> > > > > to boot-time. >> > > > > >> > > > > However, the biggest possible gain will come from utilising Falcon >> > > > > mode >> > > > > to skip the full U-Boot stage and directly boot into the OS from >> > > > > SPL. This >> > > > > assumes that the drivers involved are fully optimised, so loading up >> > > > > the >> > > > > OS image does not take longer than necessary. >> > > > >> > > > I'd like to see numbers on that. From my experience, loading and >> > > > running U-Boot does not take very long... >> > > > >> > > > > >> > > > > > 4. Rather than the reserving memory in board_init_f() we could >> > > > > > have it >> > > > > > call malloc() from the expanded region. We could then perhaps then >> > > > > > move this reserve/allocate code in to particular drivers or >> > > > > > subsystems, and drop a good chunk of the init sequence. We would >> > > > > > need >> > > > > > to have a larger malloc() region than is currently the case. >> > > > > > >> > > > > > There are still some arch-specific bits in board_init_f() which >> > > > > > make >> > > > > > these sorts of changes a bit tricky to support generically. IMO it >> > > > > > would be best to move to 'generic relocation' written in C, where >> > > > > > all >> > > > > > archs work basically the same way, before attempting any of the >> > > > > > above. >> > > > > > >> > > > > > Still, I can see some benefits and even some simplifications. >> > > > > > >> > > > > > Regards, >> > > > > > Simon >> > > >> > > >> > > >> > > This discussion should have happened. >> > > U-Boot boot sequence is crazily inefficient. >> > > >> > > >> > > >> > > When we talk about "relocation", two things are happening. >> > > >> > > [1] U-Boot proper copies itself to the very end of DRAM >> > > [2] Fix-up the global symbols >> > > >> > > In my opinion, only [2] is useful. >> > > >> > > >> > > SPL initializes the DRAM, so it knows the base and size of DRAM. >> > > SPL should be able to load the U-Boot proper to the final destination. >> > > So, [1] is unnecessary. >> > > >> > > >> > > [2] is necessary because SPL may load the U-Boot proper >> > > to a different place than CONFIG_SYS_TEXT_BASE. >> > > This feature is useful for platforms >> > > whose DRAM base/size is only known at run-time. >> > > (Of course, it should be user-configurable by CONFIG_RELOCATE >> > > or something.) >> > > >> > > Moreover, board_init_f() is unneeded - >> > > everything in board_init_f() is already done by SPL. >> > > Multiple-time DM initialization is really inefficient and ugly. >> > > >> > > >> > > The following is how the ideal boot loader would work. >> > > >> > > >> > > Requirement for U-Boot proper: >> > > U-Boot never changes the location by itself. >> > > So, SPL or a vendor loader must load U-Boot proper >> > > to the final destination directly. >> > > (You can load it to the very end of DRAM if you like, >> > > but the actual place does not matter here.) >> > > >> > > >> > > Boot sequence of U-Boot proper: >> > > If CONFIG_RELOCATE (or something) is enabled, >> > > it fixes the global symbols at the very beginning >> > > of the boot. >> > > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary) >> > > >> > > That's it. Proceed to the rest of init code. >> > > (= board_init_r) >> > > board_init_f() is unnecessary. >> > > >> > > This should work for recent platforms. >> > >> > Yes that sounds reasonable to me. >> > >> > We could do the symbol fixup/relocation in SPL after loading U-Boot., >> > although that would probably push us to using ELF format for U-Boot >> > which is a bit limited. >> > >> > Still I think the biggest performance improvement comes from turning >> > on the cache in SPL. So the above is a simplification, not really a >> > speed-up. >> >> >> Right. >> I am more interested in simplification than in speed-up. >> The boot speed is not a significant problem at least for my boards. >> >> >> > > >> > > >> > > >> > > We should think about old platforms that boot from a NOR flash or >> > > something. >> > > There are two solutions: >> > > - execute-in-place: run the code in the flash directly >> > > - use SPL (common/spl/spl-nor.c) if you want to run >> > > it from RAM >> > >> > This seems like a big regression in functionality. For example for x86 >> > 32-bit we currently don't have an SPL (we do for 64-bit). So I think >> > this means that everything would be forced to have an SPL? >> >> After grace period for migration, Yes. >> XIP or SPL. >> No relocation in U-Boot proper. >> >> This assumption will allow us to dump a lot of burden. >> >> Remove relocation >> Remove board_init_f() >> Remove pre-reloc DM init >> Perhaps, remove struct global_data >> etc. > > I have not managed to keep up with this discussion but it seems you are > suggesting > some radical change for NOR based boot boards ? > > We use such boards(ppc) and also use pram etc. would these still > work?
I think they would have to switch to SPL. I suppose another way is to adjust boards which DO use SPL to NOT use board_init_f(). Regards, Simon _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot