Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)

2021-11-09 Thread Krzysztof Wilczyński
Hi Paul,

> Thank you for your reply.

Thank you for getting back to us with a good insight.

[...]
> > I am curious - why is this a problem?  Are you power-cycling your servers
> > so often to the point where the cumulative time spent in enumerating PCI
> > devices and adding them later to IOMMU groups is a problem?
> > 
> > I am simply wondering why you decided to signal out the PCI enumeration as
> > slow in particular, especially given that a large server hardware tends to
> > have (most of the time, as per my experience) rather long initialisation
> > time either from being powered off or after being power cycled.  I can take
> > a while before the actual operating system itself will start.
> 
> It’s not a problem per se, and more a pet peeve of mine. Systems get faster
> and faster, and boottime slower and slower. On desktop systems, it’s much
> more important with firmware like coreboot taking less than one second to
> initialize the hardware and passing control to the payload/operating system.
> If we are lucky, we are going to have servers with FLOSS firmware.
> 
> But, already now, using kexec to reboot a system, avoids the problems you
> pointed out on servers, and being able to reboot a system as quickly as
> possible, lowers the bar for people to reboot systems more often to, for
> example, so updates take effect.

A very good point about the kexec usage.

This is definitely often invaluable to get security updates out of the door
quickly, update kernel version, or when you want to switch operating system
quickly (a trick that companies like Equinix Metal use when offering their
baremetal as a service).

> > We talked about this briefly with Bjorn, and there might be an option to
> > perhaps add some caching, as we suspect that the culprit here is doing PCI
> > configuration space read for each device, which can be slow on some
> > platforms.
> > 
> > However, we would need to profile this to get some quantitative data to see
> > whether doing anything would even be worthwhile.  It would definitely help
> > us understand better where the bottlenecks really are and of what magnitude.
> > 
> > I personally don't have access to such a large hardware like the one you
> > have access to, thus I was wondering whether you would have some time, and
> > be willing, to profile this for us on the hardware you have.
> > 
> > Let me know what do you think?
> 
> Sounds good. I’d be willing to help. Note, that I won’t have time before
> Wednesday next week though.

Not a problem!  I am very grateful you are willing to devote some of you
time to help with this.

I only have access to a few systems such as some commodity hardware like
a desktop PC and notebooks, and some assorted SoCs.  These are sadly not
even close to a proper server platforms, and trying to measure anything on
these does not really yield any useful data as the delays related to PCI
enumeration on startup are quite insignificant in comparison - there is
just not enough hardware there, so to speak.

I am really looking forward to the data you can gather for us and what
insight it might provide us with.

Thank you again!

Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: How to reduce PCI initialization from 5 s (1.5 s adding them to IOMMU groups)

2021-11-08 Thread Krzysztof Wilczyński
Hi Paul,

> On a PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021, Linux 5.10.70 takes
> almost five seconds to initialize PCI. According to the timestamps, 1.5 s
> are from assigning the PCI devices to the 142 IOMMU groups.
[...]
> Is there anything that could be done to reduce the time?

I am curious - why is this a problem?  Are you power-cycling your servers
so often to the point where the cumulative time spent in enumerating PCI
devices and adding them later to IOMMU groups is a problem? 

I am simply wondering why you decided to signal out the PCI enumeration as
slow in particular, especially given that a large server hardware tends to
have (most of the time, as per my experience) rather long initialisation
time either from being powered off or after being power cycled.  I can take
a while before the actual operating system itself will start.

We talked about this briefly with Bjorn, and there might be an option to
perhaps add some caching, as we suspect that the culprit here is doing PCI
configuration space read for each device, which can be slow on some
platforms.

However, we would need to profile this to get some quantitative data to see
whether doing anything would even be worthwhile.  It would definitely help
us understand better where the bottlenecks really are and of what magnitude.

I personally don't have access to such a large hardware like the one you
have access to, thus I was wondering whether you would have some time, and
be willing, to profile this for us on the hardware you have.

Let me know what do you think?

Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] PCI: vmd: Disable MSI-X remapping when possible

2021-02-08 Thread Krzysztof Wilczyński
Hi Jon,

Thank you for all the work here!

Just a number of suggestions, mainly nitpicks, so feel free to ignore
these, of course.

[...]
> +#define VMCFG_MSI_RMP_DIS0x2
[...]

What about calling this VMCONFIG_MSI_REMAP so that is more
self-explanatory (it also shares some similarity with the
PCI_REG_VMCONFIG defintition).

[...]
> + VMD_FEAT_BYPASS_MSI_REMAP   = (1 << 4),
[...]

Following on the naming that included "HAS" to indicate a feature (or
support for thereof), perhaps we could name this as, for example:

VMD_FEAT_CAN_BYPASS_MSI_REMAP

What do you think?

[...] 
> +static void vmd_enable_msi_remapping(struct vmd_dev *vmd, bool enable)
> +{
> + u16 reg;
> +
> + pci_read_config_word(vmd->dev, PCI_REG_VMCONFIG, );
> + reg = enable ? (reg & ~VMCFG_MSI_RMP_DIS) : (reg | VMCFG_MSI_RMP_DIS);
> + pci_write_config_word(vmd->dev, PCI_REG_VMCONFIG, reg);
> +}

I wonder if calling this function vmd_set_msi_remapping() would be more
aligned with what it does, since it turns the MSI remapping support on
and off, so to speak, as needed.  Do you think this would be OK to do?

[...]
> + /*
> +  * Override the irq domain bus token so the domain can be
> +  * distinguished from a regular PCI/MSI domain.
> +  */

It would be "IRQ" here.

Reviewed-by: Krzysztof Wilczyński 

Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu