On 3/25/20 4:57 PM, Patrick DELAUNAY wrote:
> Hi,

Hi,

>> From: Marek Vasut <ma...@denx.de>
>> Sent: mercredi 25 mars 2020 00:39
>>
>> Hi,
>>
>> I was looking at the STM32MP1 boot time and I noticed it takes about 2 
>> seconds
>> to get to U-Boot.
> 
> Thanks for the feedback.
> 
> To be clear, the SPL is not the ST priority as we have many limitation 
> (mainly on
> power management) for the SPL boot chain (stm32mp15_basic_defconfig):
> Rom code => SPL => U-Boot
> 
> The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot
> (stm32mp15_trusted_defconfg).

I don't want to use TF-A because it's problematic at best.

However, these issues I listed here are present also in U-Boot, so this
comment is irrelevant anyway.

>> One problem is the insane I2C timing calculation in stm32f7 i2c driver, 
>> which is
>> almost a mallocator and CPU stress test and takes about 1 second to complete 
>> in
>> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C
>> driver might do?
> 
> Our first idea to manage this I2C settings (prescaler/timings setting) was to 
> set this values 
> in device tree, but this binding was refused so this function 
> stm32_i2c_choose_solution()
> provided the better settings for any input clock and I2C frequency (called 
> for each probe).
> 
> But it is brutal and not optimum solution: try all the solution to found the 
> better one.
> And the performance problem of this loop (shared code between Linux / 
> U-Boot/TF-A drivers)
> had be already see/checked on ST side in TF-A context.
> 
> We try to improve the solution, without success, but finally the performance 
> issue
> was solved by dcache activation in TF-A before to execute this loop.

That's not a solution but a workaround.

> But as in SPL the data cache is not activated, this loop has terrible 
> performance.
> 
> We need to ding again of this topic for U-Boot point of view
> (SPL & also in U-Boot, before relocation and after relocation) .
> 
> And I had shared this issue with the ST owner of this code.
> 
> For information, I add some trace and I get for same code execution on DK2 
> board.
> - 440ms in SPL (dcache OFF)
> - 36ms in U-Boot (dcache ON)

Still, this is a workaround.

The calculation should be simplified. And why do you even need all that
memory allocations in there ?

>> Another item I found is that, in U-Boot, initf_dm() takes about half a 
>> second and so
>> does serial_init(). I didn't dig into it to find out why, but I suspect it 
>> has to do with
>> the massive amount of UCLASSes the DM has to traverse OR with the CPU being
>> slow at that point, as the clock driver didn't get probed just yet.
>>
>> Thoughts ?
> 
> Yes, it is the first parsing of device tree, and it is really slow... 
> directly linked to device
> tree size and libfdt.
> 
> And because it is done before relocation (before dache enable).
> 
> Measurement on DK2 = 649ms
> 
> It is a other topic in my TODO list.
> 
> I want to explore livetree activation to reduce the DT parsing time.
>  
> And also activate dcache in pre-location stage
> (and potentially also in SPL as it was done in 
> http://patchwork.ozlabs.org/patch/699899/)
> 
> A other solution (workaround ?) is to reduced the U-Boot device-tree (remove 
> all the nodes not used in
> U-Boot in soc file stm32mp157.dtsi or use /omit-if-no-ref/ for pincontrol 
> nodes).
> 
> See bootsage report on DK2, we have dm_f = 648ms
> 
> STM32MP> bootstage report
> Timer summary in microseconds (12 records):
>        Mark    Elapsed  Stage
>           0          0  reset
>     195,613    195,613  SPL
>     837,867    642,254  end SPL
>     840,117      2,250  board_init_f
>   2,739,639  1,899,522  board_init_r
>   3,066,815    327,176  id=64
>   3,103,377     36,562  id=65
>   3,104,078        701  main_loop
>   3,142,171     38,093  id=175
> 
> Accumulated time:
>                 38,124  dm_spl
>                 41,956  dm_r
>                648,861  dm_f
> 
> For information the time in spent in 
>       dm_extended_scan_fdt
>       => dm_scan_fdt(blob, pre_reloc_only);
> 
> This time is reduce d (few millisecond) 
> with http://patchwork.ozlabs.org/patch/1240117/
> 
> But only the data cache activation before relocation should improve this part.

For this one, I think we have no better options than the Dcache indeed.
Thanks

Reply via email to