Hi Bin, On 31 August 2015 at 08:04, Bin Meng <[email protected]> wrote: > Hi Simon, > > On Mon, Aug 31, 2015 at 9:54 PM, Simon Glass <[email protected]> wrote: >> Hi Bin, >> >> On 31 August 2015 at 07:43, Bin Meng <[email protected]> wrote: >>> Hi Simon, >>> >>> On Mon, Aug 31, 2015 at 9:20 PM, Simon Glass <[email protected]> wrote: >>>> Hi Bin, >>>> >>>> On 31 August 2015 at 03:52, Bin Meng <[email protected]> wrote: >>>>> Boot time performance degradation is observed with the conversion >>>>> to use dm pci. Intel Quark SoC has a low end x86 processor with >>>>> only 400MHz frequency and the most time consuming part is with MRC. >>>>> Each MRC register programming requires indirect access via pci bus. >>>>> With dm pci, accessing pci configuration space has some overhead. >>>>> Unfortunately this single access overhead gets accumulated in the >>>>> whole MRC process, and finally leads to twice boot time (25 seconds) >>>>> than before (12 seconds). >>>>> >>>>> To speed up the boot, create an optimized version of pci config >>>>> read/write routines without bothering to go through driver model. >>>>> Now it only takes about 3 seconds to finish MRC, which is really >>>>> fast (8 times faster than dm pci, or 4 times faster than before). >>>>> >>>>> Signed-off-by: Bin Meng <[email protected]> >>>>> --- >>>>> >>>>> arch/x86/cpu/quark/msg_port.c | 59 >>>>> +++++++++++++++++++++++++++---------------- >>>>> 1 file changed, 37 insertions(+), 22 deletions(-) >>>> >>>> Before I delve into the patch - with driver model we are using the I/O >>>> method - see pci_x86_read_config(). Is that the source of the slowdown >>>> or is it just general driver model overhead. >>> >>> The MRC calls APIs in arch/x86/cpu/quark/msg_port.c to program DDR >>> controller. Inside msg_port.c, pci_write_config_dword() and >>> pci_read_config_dword() are called. >>> >>> With driver model, the overhead is: >>> >>> pci_write_config_dword() -> pci_write_config32() -> pci_write_config() >>> -> uclass_get_device_by_seq() then pci_bus_write_config() will finally >>> call pci_x86_read_config(). >>> >>> Without driver model, there is still some overhead (so previously the >>> MRC time was about 12 seconds) >>> >>> pci_write_config_dword() -> pci_hose_write_config_dword() -> >>> TYPE1_PCI_OP(write, dword, u32, outl, 0) >>> >>> With my optimized version, pci_write_config_dword() directly calls a >>> hardcoded dword size pci config access, without the need to consider >>> offset and mask, and dereferencing hose->cfg_addr/cfg->data. >> >> What about if we use dm_pci_read_config32()? We should try to move PCI >> access to driver model to avoid the uclass_get_device_by_seq() >> everywhere. > > I don't think that helps. dm_pci_read_config32() requires a dm driver. > MRC is just something that program a bunch of registers with pci > config rw call.
My question is really what takes the time? It's not clear whether it is the driver model overhead or something else. The code you add in qrk_pci_write_config_dword() looks very similar to pci_x86_read_config(). > >> >>> >>>> >>>> If the former then perhaps we should change this. If the latter then >>>> we have work to do... >>>> > > Also for this patch, I just realized that not only it helps to reduce > the boot time, but also it helps to support PCIe root port in future > patches. > > By checking the Quark SoC manual, I found something that needs to be > done to get the two PCIe root ports on Quark SoC to work. In order to > get PCIe root ports show up in the PCI configuration space, we need > program some registers in the message bus (using APIs in > arch/x86/cpu/quark/msg_port.c). With driver model, if we still call > pci_write_config_dword() in the msg_port.c, we end up triggering PCI > enumeration process first before we actually write something to the > configuration space, however the enumeration process will hang when > scanning to the PCIe root port if we don't properly initialize the > message bus registers. This is a chicken-egg problem. Sure, although I see that as a separate problem. We can't have a driver model implementation that is really slow, so I'd like to clear that up first. Of course your patch makes sense for other reasons, but I don't want to play whack-a-mole here. Also we should start to move things away from the non-driver-model pci functions. Regards, Simon _______________________________________________ U-Boot mailing list [email protected] http://lists.denx.de/mailman/listinfo/u-boot

