On 30.09.21 11:51, Jan Beulich wrote:
> On 30.09.2021 09:52, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com>
>>
>> Assign SBDF to the PCI devices being passed through with bus 0.
> This reads a little odd: If bus is already known (and I think you imply
> segment to also be known), it's only DF which get assigned.
But at the end of the day we set all the parts of that SBDF.
Otherwise I should write "Assign DF as we know that bus and segment
are 0"
>
>> The resulting topology is where PCIe devices reside on the bus 0 of the
>> root complex itself (embedded endpoints).
>> This implementation is limited to 32 devices which are allowed on
>> a single PCI bus.
> Or up to 256 when there are multi-function ones. Imo you at least want
> to spell out how that case is intended to be handled (even if maybe
> the code doesn't cover that case yet, in which case a respective code
> comment would also want leaving).
We are not supporting multi-function yet, so I'll add a comment.
>
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -831,6 +831,66 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>> return ret;
>> }
>>
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> May I ask why the code enclosed by this conditional has been put here
> rather than under drivers/vpci/?
Indeed this can be moved to xen/drivers/vpci/vpci.c.
I'll move and update function names accordingly.
>
>> +static struct vpci_dev *pci_find_virtual_device(const struct domain *d,
>> + const struct pci_dev *pdev)
>> +{
>> + struct vpci_dev *vdev;
>> +
>> + list_for_each_entry ( vdev, &d->vdev_list, list )
>> + if ( vdev->pdev == pdev )
>> + return vdev;
>> + return NULL;
>> +}
> No locking here or ...
>
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev)
>> +{
>> + struct vpci_dev *vdev;
>> +
>> + ASSERT(!pci_find_virtual_device(d, pdev));
> ... in this first caller that I've managed to spot? See also below as
> to up the call tree the pcidevs-lock being held (which at the very
> least you would then want to ASSERT() for here).
I will move the code to vpci and make sure proper locking there
>
>> + /* Each PCI bus supports 32 devices/slots at max. */
>> + if ( d->vpci_dev_next > 31 )
>> + return -ENOSPC;
> Please avoid open-coding literals when they can be suitably expressed.
I failed to find a suitable constant for that. Could you please point
me to the one I can use here?
>
>> + vdev = xzalloc(struct vpci_dev);
>> + if ( !vdev )
>> + return -ENOMEM;
>> +
>> + /* We emulate a single host bridge for the guest, so segment is always
>> 0. */
>> + vdev->seg = 0;
>> +
>> + /*
>> + * The bus number is set to 0, so virtual devices are seen
>> + * as embedded endpoints behind the root complex.
>> + */
>> + vdev->bus = 0;
> Strictly speaking both of these assignments are redundant with you
> using xzalloc(). I'd prefer if there was just a comment, as the compiler
> has no way recognizing this in order to eliminate these stores.
Yes, I just put the assignments to be explicitly seen here as being 0.
I will remove those and put a comment.
>
>> + vdev->devfn = PCI_DEVFN(d->vpci_dev_next++, 0);
>> +
>> + vdev->pdev = pdev;
>> + vdev->domain = d;
>> +
>> + pcidevs_lock();
>> + list_add_tail(&vdev->list, &d->vdev_list);
>> + pcidevs_unlock();
> I don't support a global lock getting (ab)used for per-domain list
> management.
>
> Apart from that I don't understand why you acquire the lock here. Does
> that mean the functions further were truly left without any locking,
> by you not having noticed that this lock is already being held by the
> sole caller?
I'll re-work locking with respect to the new location for this, e.g. vpci
>
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -91,20 +91,32 @@ int __hwdom_init vpci_add_handlers(struct pci_dev *pdev)
>> /* Notify vPCI that device is assigned to guest. */
>> int vpci_assign_device(struct domain *d, const struct pci_dev *dev)
>> {
>> + int rc;
>> +
>> /* It only makes sense to assign for hwdom or guest domain. */
>> if ( is_system_domain(d) || !has_vpci(d) )
>> return 0;
>>
>> - return vpci_bar_add_handlers(d, dev);
>> + rc = vpci_bar_add_handlers(d, dev);
>> + if ( rc )
>> + return rc;
>> +
>> + return pci_add_virtual_device(d, dev);
>> }
> I've peeked at the earlier patch, and both there and here I'm struggling to
> see how undoing partially completed steps or fully completed earlier steps
> is intended to work. I'm not convinced it is legitimate to leave handlers
> in place until the tool stack manages to roll back the failed device
> assignment.
I'll see what and how we can roll back in case of error
>
>> /* Notify vPCI that device is de-assigned from guest. */
>> int vpci_deassign_device(struct domain *d, const struct pci_dev *dev)
>> {
>> + int rc;
>> +
>> /* It only makes sense to de-assign from hwdom or guest domain. */
>> if ( is_system_domain(d) || !has_vpci(d) )
>> return 0;
>>
>> + rc = pci_remove_virtual_device(d, dev);
>> + if ( rc )
>> + return rc;
>> +
>> return vpci_bar_remove_handlers(d, dev);
>> }
> So what's the ultimate effect of a partially de-assigned device, where
> one of the later steps failed? In a number of places we do best-effort
> full cleanup, by recording errors but nevertheless continuing with
> subsequent cleanup steps. I wonder whether that's a model to use here
> as well.
>
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -137,6 +137,24 @@ struct pci_dev {
>> struct vpci *vpci;
>> };
>>
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +struct vpci_dev {
>> + struct list_head list;
>> + /* Physical PCI device this virtual device is connected to. */
>> + const struct pci_dev *pdev;
>> + /* Virtual SBDF of the device. */
>> + union {
>> + struct {
>> + uint8_t devfn;
>> + uint8_t bus;
>> + uint16_t seg;
>> + };
>> + pci_sbdf_t sbdf;
> Could you explain to me why pci_sbdf_t (a typedef of a union) isn't
> providing all you need? By putting it in a union with a custom
> struct you set yourself up for things going out of sync if anyone
> chose to alter pci_sbdf_t's layout.
Sure, pci_sbdf_t should be enough
>
>> @@ -167,6 +185,10 @@ const unsigned long *pci_get_ro_map(u16 seg);
>> int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> const struct pci_dev_info *, nodeid_t node);
>> int pci_remove_device(u16 seg, u8 bus, u8 devfn);
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +int pci_add_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +int pci_remove_virtual_device(struct domain *d, const struct pci_dev *pdev);
>> +#endif
> Like for their definitions I question the placement of these
> declarations.
Will move to vpci.h
>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -444,6 +444,14 @@ struct domain
>>
>> #ifdef CONFIG_HAS_PCI
>> struct list_head pdev_list;
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> + struct list_head vdev_list;
>> + /*
>> + * Current device number used by the virtual PCI bus topology
>> + * to assign a unique SBDF to a passed through virtual PCI device.
>> + */
>> + int vpci_dev_next;
> In how far can the number stored here be negative? If it can't be,
> please use unsigned int.
Will use unsigned
>
> As to the comment - "current" is ambiguous: Is it the number that
> was used last, or the next one to be used?
I will update the comment to remove ambiguity
> Jan
>
Thank you,
Oleksandr