On 3/25/20 4:57 PM, Patrick DELAUNAY wrote: > Hi, Hi,
>> From: Marek Vasut <[email protected]> >> Sent: mercredi 25 mars 2020 00:39 >> >> Hi, >> >> I was looking at the STM32MP1 boot time and I noticed it takes about 2 >> seconds >> to get to U-Boot. > > Thanks for the feedback. > > To be clear, the SPL is not the ST priority as we have many limitation > (mainly on > power management) for the SPL boot chain (stm32mp15_basic_defconfig): > Rom code => SPL => U-Boot > > The preconized boot chain for STM32MP1 is Rom code => TF-A => U-Boot > (stm32mp15_trusted_defconfg). I don't want to use TF-A because it's problematic at best. However, these issues I listed here are present also in U-Boot, so this comment is irrelevant anyway. >> One problem is the insane I2C timing calculation in stm32f7 i2c driver, >> which is >> almost a mallocator and CPU stress test and takes about 1 second to complete >> in >> SPL -- we need some simpler replacement for that, possibly the one in DWC I2C >> driver might do? > > Our first idea to manage this I2C settings (prescaler/timings setting) was to > set this values > in device tree, but this binding was refused so this function > stm32_i2c_choose_solution() > provided the better settings for any input clock and I2C frequency (called > for each probe). > > But it is brutal and not optimum solution: try all the solution to found the > better one. > And the performance problem of this loop (shared code between Linux / > U-Boot/TF-A drivers) > had be already see/checked on ST side in TF-A context. > > We try to improve the solution, without success, but finally the performance > issue > was solved by dcache activation in TF-A before to execute this loop. That's not a solution but a workaround. > But as in SPL the data cache is not activated, this loop has terrible > performance. > > We need to ding again of this topic for U-Boot point of view > (SPL & also in U-Boot, before relocation and after relocation) . > > And I had shared this issue with the ST owner of this code. > > For information, I add some trace and I get for same code execution on DK2 > board. > - 440ms in SPL (dcache OFF) > - 36ms in U-Boot (dcache ON) Still, this is a workaround. The calculation should be simplified. And why do you even need all that memory allocations in there ? >> Another item I found is that, in U-Boot, initf_dm() takes about half a >> second and so >> does serial_init(). I didn't dig into it to find out why, but I suspect it >> has to do with >> the massive amount of UCLASSes the DM has to traverse OR with the CPU being >> slow at that point, as the clock driver didn't get probed just yet. >> >> Thoughts ? > > Yes, it is the first parsing of device tree, and it is really slow... > directly linked to device > tree size and libfdt. > > And because it is done before relocation (before dache enable). > > Measurement on DK2 = 649ms > > It is a other topic in my TODO list. > > I want to explore livetree activation to reduce the DT parsing time. > > And also activate dcache in pre-location stage > (and potentially also in SPL as it was done in > http://patchwork.ozlabs.org/patch/699899/) > > A other solution (workaround ?) is to reduced the U-Boot device-tree (remove > all the nodes not used in > U-Boot in soc file stm32mp157.dtsi or use /omit-if-no-ref/ for pincontrol > nodes). > > See bootsage report on DK2, we have dm_f = 648ms > > STM32MP> bootstage report > Timer summary in microseconds (12 records): > Mark Elapsed Stage > 0 0 reset > 195,613 195,613 SPL > 837,867 642,254 end SPL > 840,117 2,250 board_init_f > 2,739,639 1,899,522 board_init_r > 3,066,815 327,176 id=64 > 3,103,377 36,562 id=65 > 3,104,078 701 main_loop > 3,142,171 38,093 id=175 > > Accumulated time: > 38,124 dm_spl > 41,956 dm_r > 648,861 dm_f > > For information the time in spent in > dm_extended_scan_fdt > => dm_scan_fdt(blob, pre_reloc_only); > > This time is reduce d (few millisecond) > with http://patchwork.ozlabs.org/patch/1240117/ > > But only the data cache activation before relocation should improve this part. For this one, I think we have no better options than the Dcache indeed. Thanks

