Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-13 Thread Bjorn Helgaas
On Mon, Jan 13, 2025 at 11:18:57AM +0100, Roger Pau Monné wrote:
> On Fri, Jan 10, 2025 at 04:21:29PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> > > The PCI segment value is limited to 16 bits, however there are buses like 
> > > VMD
> > > that fake being part of the PCI topology by adding segment with a number
> > > outside the scope of the PCI firmware specification range (>= 0x1). 
> > > The
> > > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 
> > > 16 bit
> > > width.
> > >
> > > Attempting to register or manage those devices with Xen would result in 
> > > errors
> > > at best, or overlaps with existing devices living on the truncated 
> > > equivalent
> > > segment values.
> > 
> > The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> > value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> > 16-bit values.
> > 
> > But otherwise, the segment value is pretty much an arbitrary software
> > value, and the kernel works fine with the larger domain values from
> > vmd_find_free_domain(), so this isn't quite enough to explain what the
> > issue with Xen is.
> > 
> > Does Xen truncate the domain to 16 bits or use it to look up something
> > in ACPI?
> 
> In the interface between Xen and Linux the segment field is 16 bit
> width, so with the current interface is not possible to reference
> devices that are past the 0x segment.

I think this specific reason (and maybe even struct
physdev_pci_device_add) would be more useful than the ACPI _SEG and
MCFG things, which are not as directly connected here.

Bjorn



Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-13 Thread Roger Pau Monné
On Fri, Jan 10, 2025 at 04:21:29PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> > The PCI segment value is limited to 16 bits, however there are buses like 
> > VMD
> > that fake being part of the PCI topology by adding segment with a number
> > outside the scope of the PCI firmware specification range (>= 0x1). The
> > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 
> > bit
> > width.
> >
> > Attempting to register or manage those devices with Xen would result in 
> > errors
> > at best, or overlaps with existing devices living on the truncated 
> > equivalent
> > segment values.
> 
> The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> 16-bit values.
> 
> But otherwise, the segment value is pretty much an arbitrary software
> value, and the kernel works fine with the larger domain values from
> vmd_find_free_domain(), so this isn't quite enough to explain what the
> issue with Xen is.
> 
> Does Xen truncate the domain to 16 bits or use it to look up something
> in ACPI?

In the interface between Xen and Linux the segment field is 16 bit
width, so with the current interface is not possible to reference
devices that are past the 0x segment.

I also wonder whether Xen and Linux (or guest OSes in general) would
agree on how to reference such devices.  AFAICT VMD segment numbers
are purely a software construct, but not something enforced by the
specification.  Could for example FreeBSD assign a different segment
to VMD devices?

If so we would need some kind of specification about how VMD segment
values are assigned so that OSes have a coherent way of referencing
VMD devices without ambiguity.

Thanks, Roger.



Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-13 Thread Roger Pau Monné
On Mon, Jan 13, 2025 at 08:17:23AM +0100, Jan Beulich wrote:
> On 10.01.2025 15:01, Roger Pau Monne wrote:
> > The PCI segment value is limited to 16 bits, however there are buses like 
> > VMD
> > that fake being part of the PCI topology by adding segment with a number
> > outside the scope of the PCI firmware specification range (>= 0x1). The
> > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 
> > bit
> > width.
> > 
> > Attempting to register or manage those devices with Xen would result in 
> > errors
> > at best, or overlaps with existing devices living on the truncated 
> > equivalent
> > segment values.
> > 
> > Skip notifying Xen about those devices.
> 
> Hmm, is simply omitting the notification really all it takes? How would Xen
> manage MSI / MSI-X, for example, without knowing of the device? As per the
> BoF on the summit in Prague(?), I continue to think we need partial driver
> logic in Xen for VMD ...

The basic mode of operation of devices behind a VMD bridge is to
reference the interrupts of the bridge device in its MSI(-X) entries,
so the VMD bridge acts as a de-multiplexer and forwards the interrupts
to the device behind the VMD bridge.  See vmd_alloc_irqs() (and
calling context) in the VMD driver for a reference about how this is
setup and operates.  Note also that devices behind a VMD bridge
operate using a different MSI domain, that uses a custom
irq_compose_msi_msg hook.

Thanks, Roger.



Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-12 Thread Jan Beulich
On 10.01.2025 23:21, Bjorn Helgaas wrote:
> On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
>> The PCI segment value is limited to 16 bits, however there are buses like VMD
>> that fake being part of the PCI topology by adding segment with a number
>> outside the scope of the PCI firmware specification range (>= 0x1). The
>> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 
>> bit
>> width.
>>
>> Attempting to register or manage those devices with Xen would result in 
>> errors
>> at best, or overlaps with existing devices living on the truncated equivalent
>> segment values.
> 
> The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> 16-bit values.
> 
> But otherwise, the segment value is pretty much an arbitrary software
> value, and the kernel works fine with the larger domain values from
> vmd_find_free_domain(), so this isn't quite enough to explain what the
> issue with Xen is.
> 
> Does Xen truncate the domain to 16 bits or use it to look up something
> in ACPI?

One of the involved public interface structs starts like this:

struct physdev_pci_device_add {
/* IN */
uint16_t seg;
uint8_t bus;
uint8_t devfn;
...

So yes, wider segment values would be truncated. Plus, even if they weren't,
there would need to be coordination between Dom0 and Xen on which devices
gets which segment number, since - as you say - the assignment in Linux is
pretty much arbitrary.

Jan



Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-12 Thread Jan Beulich
On 10.01.2025 15:01, Roger Pau Monne wrote:
> The PCI segment value is limited to 16 bits, however there are buses like VMD
> that fake being part of the PCI topology by adding segment with a number
> outside the scope of the PCI firmware specification range (>= 0x1). The
> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> width.
> 
> Attempting to register or manage those devices with Xen would result in errors
> at best, or overlaps with existing devices living on the truncated equivalent
> segment values.
> 
> Skip notifying Xen about those devices.

Hmm, is simply omitting the notification really all it takes? How would Xen
manage MSI / MSI-X, for example, without knowing of the device? As per the
BoF on the summit in Prague(?), I continue to think we need partial driver
logic in Xen for VMD ...

Jan



Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

2025-01-10 Thread Bjorn Helgaas
On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> The PCI segment value is limited to 16 bits, however there are buses like VMD
> that fake being part of the PCI topology by adding segment with a number
> outside the scope of the PCI firmware specification range (>= 0x1). The
> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> width.
>
> Attempting to register or manage those devices with Xen would result in errors
> at best, or overlaps with existing devices living on the truncated equivalent
> segment values.

The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
16-bit values.

But otherwise, the segment value is pretty much an arbitrary software
value, and the kernel works fine with the larger domain values from
vmd_find_free_domain(), so this isn't quite enough to explain what the
issue with Xen is.

Does Xen truncate the domain to 16 bits or use it to look up something
in ACPI?

> Skip notifying Xen about those devices.
> 
> Signed-off-by: Roger Pau Monné 
> ---
>  drivers/xen/pci.c | 19 +++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 416f231809cb..08e82fd1263e 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -43,6 +43,13 @@ static int xen_add_device(struct device *dev)
>   pci_mcfg_reserved = true;
>   }
>  #endif
> +
> + if (pci_domain_nr(pci_dev->bus) >> 16) {
> + dev_info(dev,
> +  "not registering with Xen: invalid PCI segment\n");
> + return 0;
> + }
> +
>   if (pci_seg_supported) {
>   DEFINE_RAW_FLEX(struct physdev_pci_device_add, add, optarr, 1);
>  
> @@ -149,6 +156,12 @@ static int xen_remove_device(struct device *dev)
>   int r;
>   struct pci_dev *pci_dev = to_pci_dev(dev);
>  
> + if (pci_domain_nr(pci_dev->bus) >> 16) {
> + dev_info(dev,
> +  "not unregistering with Xen: invalid PCI segment\n");
> + return 0;
> + }
> +
>   if (pci_seg_supported) {
>   struct physdev_pci_device device = {
>   .seg = pci_domain_nr(pci_dev->bus),
> @@ -182,6 +195,12 @@ int xen_reset_device(const struct pci_dev *dev)
>   .flags = PCI_DEVICE_RESET_FLR,
>   };
>  
> + if (pci_domain_nr(dev->bus) >> 16) {
> + dev_info(&dev->dev,
> +  "unable to notify Xen of device reset: invalid PCI 
> segment\n");
> + return 0;
> + }
> +
>   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device);
>  }
>  EXPORT_SYMBOL_GPL(xen_reset_device);
> -- 
> 2.46.0
>