On 2025/5/19 21:21, Roger Pau Monné wrote: > On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote: >> On 19.05.2025 09:13, Chen, Jiqian wrote: >>> On 2025/5/19 14:56, Jan Beulich wrote: >>>> On 19.05.2025 08:43, Chen, Jiqian wrote: >>>>> On 2025/5/18 22:20, Jan Beulich wrote: >>>>>> On 09.05.2025 11:05, Jiqian Chen wrote: >>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct >>>>>>> pci_dev *pdev) >>>>>>> >>>>>>> PCI_STATUS_RSVDZ_MASK); >>>>>>> } >>>>>>> >>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev) >>>>>>> +{ >>>>>>> + unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480; >>>>>> >>>>>> The ttl value exists (in the function you took it from) to make sure >>>>>> the loop below eventually ends. That is, to be able to kind of >>>>>> gracefully deal with loops in the linked list. Such loops, however, >>>>>> would ... >>>>>> >>>>>>> + if ( !is_hardware_domain(pdev->domain) ) >>>>>>> + /* Extended capabilities read as zero, write ignore for guest >>>>>>> */ >>>>>>> + return vpci_add_register(pdev->vpci, vpci_read_val, NULL, >>>>>>> + pos, 4, (void *)0); >>>>>>> + >>>>>>> + while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- ) >>>>>>> + { >>>>>>> + uint32_t header = pci_conf_read32(pdev->sbdf, pos); >>>>>>> + int rc; >>>>>>> + >>>>>>> + if ( !header ) >>>>>>> + return 0; >>>>>>> + >>>>>>> + rc = vpci_add_register(pdev->vpci, vpci_read_val, >>>>>>> vpci_hw_write32, >>>>>>> + pos, 4, (void *)(uintptr_t)header); >>>>>> >>>>>> ... mean we may invoke this twice for the same capability. Such >>>>>> a secondary invocation would fail with -EEXIST, causing device init >>>>>> to fail altogether. Which is kind of against our aim of exposing >>>>>> (in a controlled manner) as much of the PCI hardware as possible. >>>>> May I know what situation that can make this twice for one capability >>>>> when initialization? >>>>> Does hardware capability list have a cycle? >>>> >>>> Any of this is to work around flawed hardware, I suppose. >>>> >>>>>> Imo we ought to be using a bitmap to detect the situation earlier >>>>>> and hence to be able to avoid redundant register addition. Thoughts? >>>>> Can we just let it go forward and continue to add register for next >>>>> capability when rc == -EXIST, instead of returning error ? >>>> >>>> Possible, but feels wrong. >>> How about when EXIST, setting the next bits of previous extended capability >>> to be zero and return 0? Then we break the cycle. >> >> Hmm. Again an option, yet again I'm not certain. But that's perhaps just >> me, and Roger may be fine with it. IOW we might as well start out this way, >> and adjust if (ever) an issue with a real device is found. > > Returning -EEXIST might be fine, but at that point there's no further > capability to process. There's a loop in the linked capability list, > and we should just exit. There needs to be a warning in this case, > and since this is for the hardware domain only it shouldn't be fatal. > If I understand correctly, I need to add below in next version?
rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32, pos, 4, (void *)(uintptr_t)header); + + if ( rc == -EEXIST ) + { + printk(XENLOG_WARNING + "%pd %pp: there is a loop in the linked capability list\n", + pdev->domain, &pdev->sbdf); + return 0; + } + if ( rc ) return rc; > If it was for domUs we would possibly need to discuss whether > assigning the device should fail if a capability linked list loop is > found. > > Thanks, Roger. -- Best regards, Jiqian Chen.