Hi Bin, On 31 August 2015 at 19:40, Bin Meng <[email protected]> wrote: > Hi Simon, > > On Tue, Sep 1, 2015 at 7:12 AM, Simon Glass <[email protected]> wrote: >> Hi Bin, >> >> On 31 August 2015 at 08:04, Bin Meng <[email protected]> wrote: >>> Hi Simon, >>> >>> On Mon, Aug 31, 2015 at 9:54 PM, Simon Glass <[email protected]> wrote: >>>> Hi Bin, >>>> >>>> On 31 August 2015 at 07:43, Bin Meng <[email protected]> wrote: >>>>> Hi Simon, >>>>> >>>>> On Mon, Aug 31, 2015 at 9:20 PM, Simon Glass <[email protected]> wrote: >>>>>> Hi Bin, >>>>>> >>>>>> On 31 August 2015 at 03:52, Bin Meng <[email protected]> wrote: >>>>>>> Boot time performance degradation is observed with the conversion >>>>>>> to use dm pci. Intel Quark SoC has a low end x86 processor with >>>>>>> only 400MHz frequency and the most time consuming part is with MRC. >>>>>>> Each MRC register programming requires indirect access via pci bus. >>>>>>> With dm pci, accessing pci configuration space has some overhead. >>>>>>> Unfortunately this single access overhead gets accumulated in the >>>>>>> whole MRC process, and finally leads to twice boot time (25 seconds) >>>>>>> than before (12 seconds). >>>>>>> >>>>>>> To speed up the boot, create an optimized version of pci config >>>>>>> read/write routines without bothering to go through driver model. >>>>>>> Now it only takes about 3 seconds to finish MRC, which is really >>>>>>> fast (8 times faster than dm pci, or 4 times faster than before). >>>>>>> >>>>>>> Signed-off-by: Bin Meng <[email protected]> >>>>>>> --- >>>>>>> >>>>>>> arch/x86/cpu/quark/msg_port.c | 59 >>>>>>> +++++++++++++++++++++++++++---------------- >>>>>>> 1 file changed, 37 insertions(+), 22 deletions(-) >>>>>> >>>>>> Before I delve into the patch - with driver model we are using the I/O >>>>>> method - see pci_x86_read_config(). Is that the source of the slowdown >>>>>> or is it just general driver model overhead. >>>>> >>>>> The MRC calls APIs in arch/x86/cpu/quark/msg_port.c to program DDR >>>>> controller. Inside msg_port.c, pci_write_config_dword() and >>>>> pci_read_config_dword() are called. >>>>> >>>>> With driver model, the overhead is: >>>>> >>>>> pci_write_config_dword() -> pci_write_config32() -> pci_write_config() >>>>> -> uclass_get_device_by_seq() then pci_bus_write_config() will finally >>>>> call pci_x86_read_config(). >>>>> >>>>> Without driver model, there is still some overhead (so previously the >>>>> MRC time was about 12 seconds) >>>>> >>>>> pci_write_config_dword() -> pci_hose_write_config_dword() -> >>>>> TYPE1_PCI_OP(write, dword, u32, outl, 0) >>>>> >>>>> With my optimized version, pci_write_config_dword() directly calls a >>>>> hardcoded dword size pci config access, without the need to consider >>>>> offset and mask, and dereferencing hose->cfg_addr/cfg->data. >>>> >>>> What about if we use dm_pci_read_config32()? We should try to move PCI >>>> access to driver model to avoid the uclass_get_device_by_seq() >>>> everywhere. >>> >>> I don't think that helps. dm_pci_read_config32() requires a dm driver. >>> MRC is just something that program a bunch of registers with pci >>> config rw call. >> >> My question is really what takes the time? It's not clear whether it >> is the driver model overhead or something else. The code you add in >> qrk_pci_write_config_dword() looks very similar to >> pci_x86_read_config(). >> > > It is the driver model overhead. In order to get to > pci_x86_read_config(), we need go through a bunch of function calls > (see above). Yes, my version is very similar to pci_x86_read_config(), > but my version is more simpler as it only needs to deal with dword > size thus no need to do offset mask and switch/case. If you look at > the Quark MRC codes, there are thousands of calls to msg_port_read() > and msg_port_write(). > >>> >>>> >>>>> >>>>>> >>>>>> If the former then perhaps we should change this. If the latter then >>>>>> we have work to do... >>>>>> >>> >>> Also for this patch, I just realized that not only it helps to reduce >>> the boot time, but also it helps to support PCIe root port in future >>> patches. >>> >>> By checking the Quark SoC manual, I found something that needs to be >>> done to get the two PCIe root ports on Quark SoC to work. In order to >>> get PCIe root ports show up in the PCI configuration space, we need >>> program some registers in the message bus (using APIs in >>> arch/x86/cpu/quark/msg_port.c). With driver model, if we still call >>> pci_write_config_dword() in the msg_port.c, we end up triggering PCI >>> enumeration process first before we actually write something to the >>> configuration space, however the enumeration process will hang when >>> scanning to the PCIe root port if we don't properly initialize the >>> message bus registers. This is a chicken-egg problem. >> >> Sure, although I see that as a separate problem. >> > > Yes, it is a separate problem. It came to me when I was reading the > manual after I submitted the patch. > >> We can't have a driver model implementation that is really slow, so >> I'd like to clear that up first. Of course your patch makes sense for >> other reasons, but I don't want to play whack-a-mole here. >> > > Agreed. So far the driver model PCI is used on x86 boards and sandbox. > The performance issue was not obvious on these targets, but it is > quite noticeable on Intel Quark. These PCI config read/write routines > will go through lots of function calls before we actually touch the > I/O ports 0xcf8 and 0xcfc, especially when the device is not on the > root bus (it needs to go through its parent followed by its parent's > parent). > > But anyway I think this optimization for Quark is needed. I doubt we > can optimize driver model pci to such an extent.
Can we use this as an opportunity to try a few things? If we use the dm_pci functions that should cut out some overhead. Can you try an experiment to see how much difference it makes? dm_pci_read_config32() pci_get_bdf() pci_read_config() for loop which probably terminates immediately pci_bus_read_config() read_config(), which is pci_x86_read_config() So it's not great but it doesn't look too bad. Also is this an Intel Gallileo gen 2 development board? I'm thinking of getting one. > >> Also we should start to move things away from the non-driver-model pci >> functions. >> > > Regards, > Bin Regards, Simon _______________________________________________ U-Boot mailing list [email protected] http://lists.denx.de/mailman/listinfo/u-boot

