Hi Joakim,

On 29 November 2017 at 03:48, Joakim Tjernlund
<joakim.tjernl...@infinera.com> wrote:
> On Wed, 2017-11-29 at 19:11 +0900, Masahiro Yamada wrote:
>> CAUTION: This email originated from outside of the organization. Do not 
>> click links or open attachments unless you recognize the sender and know the 
>> content is safe.
>>
>>
>> Hi Simon,
>>
>>
>> 2017-11-28 2:13 GMT+09:00 Simon Glass <s...@chromium.org>:
>> > (Tom - any thoughts about a more expansive cc list on this?)
>> >
>> > Hi Masahiro,
>> >
>> > On 26 November 2017 at 07:16, Masahiro Yamada
>> > <yamada.masah...@socionext.com> wrote:
>> > > 2017-11-26 20:38 GMT+09:00 Simon Glass <s...@chromium.org>:
>> > > > Hi Philipp,
>> > > >
>> > > > On 25 November 2017 at 16:31, Dr. Philipp Tomsich
>> > > > <philipp.toms...@theobroma-systems.com> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > > On 25 Nov 2017, at 23:34, Simon Glass <s...@chromium.org> wrote:
>> > > > > >
>> > > > > > +Tom, Masahiro, Philipp
>> > > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > On 22 November 2017 at 03:27, Wolfgang Denk <w...@denx.de> wrote:
>> > > > > > > Dear Kever Yang,
>> > > > > > >
>> > > > > > > In message <fd0bb500-80c4-f317-cc18-f7aaf1344...@rock-chips.com> 
>> > > > > > > you wrote:
>> > > > > > > >
>> > > > > > > > I can understand this feature, we always do dram_init_banks() 
>> > > > > > > > first,
>> > > > > > > > then we relocate to 'known' area, then will be no risk to 
>> > > > > > > > access memory.
>> > > > > > > > I believe there must be some historical reason for some kind 
>> > > > > > > > of device,
>> > > > > > > > the relocate feature is a wonderful idea for it.
>> > > > > > >
>> > > > > > > This is actuallyu not so much a feature needed to support some
>> > > > > > > specific device (in this case much simpler approahces would be
>> > > > > > > possible), but to support a whole set of features.  Unfortunately
>> > > > > > > these appear to get forgotten / ignored over time.
>> > > > > > >
>> > > > > > > >     many other SoCs should be similar.
>> > > > > > > > - Without relocate we can save many step, some of our customer 
>> > > > > > > > really
>> > > > > > > >     care much about the boot time duration.
>> > > > > > > >     * no need to relocate everything
>> > > > > > > >     * no need to copy all the code
>> > > > > > > >     * no need init the driver more than once
>> > > > > > >
>> > > > > > > Please have a look at the README, section "Memory Management".
>> > > > > > > The reloaction is not done to any _fixed_ address, but the 
>> > > > > > > address
>> > > > > > > is actually computed at runtime, depending on a number features
>> > > > > > > enabled (at least this is how it used to be - appearently little 
>> > > > > > > of
>> > > > > > > this is tested on a regular base, so I would not be surprised if
>> > > > > > > things are broken today).
>> > > > > > >
>> > > > > > > The basic idea was to reserve areas of memory at the top of RAM,
>> > > > > > > that would not be initialized / modified by U-Boot and Linux, not
>> > > > > > > even across a reset / warm boot.
>> > > > > > >
>> > > > > > > This was used for exaple for:
>> > > > > > >
>> > > > > > > - pRAM (Protected RAM) which could be used to store all kind of 
>> > > > > > > data
>> > > > > > >  (for example, using a pramfs [Protected and Persistent RAM
>> > > > > > >  Filesystem]) that could be kept across reboots of the OS.
>> > > > > > >
>> > > > > > > - shared frame buffer / video memory. U-Boot and Linux would be 
>> > > > > > > able
>> > > > > > >  to initialize the video memory just once (in U-Boot) and then
>> > > > > > >  share it, maybe even across reboots.  especially, this would 
>> > > > > > > allow
>> > > > > > >  for a very early splash screen that gets passed (flicker free) 
>> > > > > > > to
>> > > > > > >  Linux until some Linux GUI takes over (much more difficult 
>> > > > > > > today).
>> > > > > > >
>> > > > > > > - shared log buffer: U-Boot and Linux used to use the same syslog
>> > > > > > >  buffer mechanism, so you could share it between U-Boot and 
>> > > > > > > Linux.
>> > > > > > >  this allows for example to
>> > > > > > >  * read the Linux kernel panic messages after reset in U-Boot; 
>> > > > > > > this
>> > > > > > >    is very useful when you bring up a new system and Linux 
>> > > > > > > crashes
>> > > > > > >    before it can display the log buffer on the console
>> > > > > > >  * pass U-Boot POST results on to Linux, so the application code
>> > > > > > >    can read and process these
>> > > > > > >  * process the system log of the previous run (especially after a
>> > > > > > >    panic) in Lunux after it rebootet.
>> > > > > > >
>> > > > > > > etc.
>> > > > > > >
>> > > > > > > There are a number of such features which require to reserve 
>> > > > > > > room at
>> > > > > > > the top of RAM, the size of which is calculatedat runtime, often
>> > > > > > > depending on user settable environment data.
>> > > > > > >
>> > > > > > > All this cannot be done without relocation to a (dynmaically
>> > > > > > > computed) target address.
>> > > > > > >
>> > > > > > >
>> > > > > > > Yes, the code could be simpler and faster without that - but 
>> > > > > > > then,
>> > > > > > > you cut off a number of features.
>> > > > > >
>> > > > > > I would be interested in seeing benchmarks showing the cost of
>> > > > > > relocation in terms of boot time. Last time I did this was on 
>> > > > > > Exynos 5
>> > > > > > and it was some years ago. The time was pretty small provided the
>> > > > > > cache was on for the memory copies associated with relocation 
>> > > > > > itself.
>> > > > > > Something like 10-20ms but I don't have the numbers handy.
>> > > > > >
>> > > > > > I think it is useful to be able to allocate memory in 
>> > > > > > board_init_f()
>> > > > > > for use by U-Boot for things like the display and the malloc() 
>> > > > > > region.
>> > > > > >
>> > > > > > Options we might consider:
>> > > > > >
>> > > > > > 1. Don't relocate the code and data. Thus we could avoid the copy 
>> > > > > > and
>> > > > > > relocation cost. This is already supported with the 
>> > > > > > GD_FLG_SKIP_RELOC
>> > > > > > used when U-Boot runs as an EFI app
>> > > > > >
>> > > > > > 2. Rather than throwing away the old malloc() region, keep it 
>> > > > > > around
>> > > > > > so existing allocated blocks work. Then new malloc() region would 
>> > > > > > be
>> > > > > > used for future allocations. We could perhaps ignore free() calls 
>> > > > > > in
>> > > > > > that region
>> > > > > >
>> > > > > > 2a. This would allow us to avoid re-init of driver model in most 
>> > > > > > cases
>> > > > > > I think. E.g. we could init serial and timer before relocation and
>> > > > > > leave them inited after relocation. We could just init the
>> > > > > > 'additional' devices not done before relocation.
>> > > > > >
>> > > > > > 2b. I suppose we could even extend this to SPL if we wanted to. I
>> > > > > > suspect it would just be a pain though, since SPL might use memory
>> > > > > > that U-Boot wants.
>> > > > > >
>> > > > > > 3. We could turn on the cache earlier. This removes most of the
>> > > > > > boot-time penalty. Ideally this should be turned on in SPL and 
>> > > > > > perhaps
>> > > > > > redone in U-Boot which has more memory available. If SPL is not 
>> > > > > > used,
>> > > > > > we could turn on the cache before relocation.
>> > > > >
>> > > > > Both turning on the cache and initialising the clocking could be of 
>> > > > > benefit
>> > > > > to boot-time.
>> > > > >
>> > > > > However, the biggest possible gain will come from utilising Falcon 
>> > > > > mode
>> > > > > to skip the full U-Boot stage and directly boot into the OS from 
>> > > > > SPL.  This
>> > > > > assumes that the drivers involved are fully optimised, so loading up 
>> > > > > the
>> > > > > OS image does not take longer than necessary.
>> > > >
>> > > > I'd like to see numbers on that. From my experience, loading and
>> > > > running U-Boot does not take very long...
>> > > >
>> > > > >
>> > > > > > 4. Rather than the reserving memory in board_init_f() we could 
>> > > > > > have it
>> > > > > > call malloc() from the expanded region. We could then perhaps then
>> > > > > > move this reserve/allocate code in to particular drivers or
>> > > > > > subsystems, and drop a good chunk of the init sequence. We would 
>> > > > > > need
>> > > > > > to have a larger malloc() region than is currently the case.
>> > > > > >
>> > > > > > There are still some arch-specific bits in board_init_f() which 
>> > > > > > make
>> > > > > > these sorts of changes a bit tricky to support generically. IMO it
>> > > > > > would be best to move to 'generic relocation' written in C, where 
>> > > > > > all
>> > > > > > archs work basically the same way, before attempting any of the 
>> > > > > > above.
>> > > > > >
>> > > > > > Still, I can see some benefits and even some simplifications.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Simon
>> > >
>> > >
>> > >
>> > > This discussion should have happened.
>> > > U-Boot boot sequence is crazily inefficient.
>> > >
>> > >
>> > >
>> > > When we talk about "relocation", two things are happening.
>> > >
>> > >  [1] U-Boot proper copies itself to the very end of DRAM
>> > >  [2] Fix-up the global symbols
>> > >
>> > > In my opinion, only [2] is useful.
>> > >
>> > >
>> > > SPL initializes the DRAM, so it knows the base and size of DRAM.
>> > > SPL should be able to load the U-Boot proper to the final destination.
>> > > So, [1] is unnecessary.
>> > >
>> > >
>> > > [2] is necessary because SPL may load the U-Boot proper
>> > > to a different place than CONFIG_SYS_TEXT_BASE.
>> > > This feature is useful for platforms
>> > > whose DRAM base/size is only known at run-time.
>> > > (Of course, it should be user-configurable by CONFIG_RELOCATE
>> > > or something.)
>> > >
>> > > Moreover, board_init_f() is unneeded -
>> > > everything in board_init_f() is already done by SPL.
>> > > Multiple-time DM initialization is really inefficient and ugly.
>> > >
>> > >
>> > > The following is how the ideal boot loader would work.
>> > >
>> > >
>> > > Requirement for U-Boot proper:
>> > > U-Boot never changes the location by itself.
>> > > So, SPL or a vendor loader must load U-Boot proper
>> > > to the final destination directly.
>> > > (You can load it to the very end of DRAM if you like,
>> > > but the actual place does not matter here.)
>> > >
>> > >
>> > > Boot sequence of U-Boot proper:
>> > > If CONFIG_RELOCATE (or something) is enabled,
>> > > it fixes the global symbols at the very beginning
>> > > of the boot.
>> > > (In this case, CONFIG_SYS_TEXT_BASE can be arbitrary)
>> > >
>> > > That's it.  Proceed to the rest of init code.
>> > > (= board_init_r)
>> > > board_init_f() is unnecessary.
>> > >
>> > > This should work for recent platforms.
>> >
>> > Yes that sounds reasonable to me.
>> >
>> > We could do the symbol fixup/relocation in SPL after loading U-Boot.,
>> > although that would probably push us to using ELF format for U-Boot
>> > which is a bit limited.
>> >
>> > Still I think the biggest performance improvement comes from turning
>> > on the cache in SPL. So the above is a simplification, not really a
>> > speed-up.
>>
>>
>> Right.
>> I am more interested in simplification than in speed-up.
>> The boot speed is not a significant problem at least for my boards.
>>
>>
>> > >
>> > >
>> > >
>> > > We should think about old platforms that boot from a NOR flash or 
>> > > something.
>> > > There are two solutions:
>> > >  - execute-in-place: run the code in the flash directly
>> > >  - use SPL (common/spl/spl-nor.c) if you want to run
>> > >    it from RAM
>> >
>> > This seems like a big regression in functionality. For example for x86
>> > 32-bit we currently don't have an SPL (we do for 64-bit). So I think
>> > this means that everything would be forced to have an SPL?
>>
>> After grace period for migration, Yes.
>> XIP or SPL.
>> No relocation in U-Boot proper.
>>
>> This assumption will allow us to dump a lot of burden.
>>
>> Remove relocation
>> Remove board_init_f()
>> Remove pre-reloc DM init
>> Perhaps, remove struct global_data
>> etc.
>
> I have not managed to keep up with this discussion but it seems you are 
> suggesting
> some radical change for NOR based boot boards ?
>
> We use such boards(ppc) and also use pram etc. would these still
> work?

I think they would have to switch to SPL.

I suppose another way is to adjust boards which DO use SPL to NOT use
board_init_f().

Regards,
Simon
_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot

Reply via email to