On 2025/5/21 14:25, Jan Beulich wrote: > On 21.05.2025 08:08, Chen, Jiqian wrote: >> On 2025/5/19 21:21, Roger Pau Monné wrote: >>> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote: >>>> On 19.05.2025 09:13, Chen, Jiqian wrote: >>>>> On 2025/5/19 14:56, Jan Beulich wrote: >>>>>> On 19.05.2025 08:43, Chen, Jiqian wrote: >>>>>>> On 2025/5/18 22:20, Jan Beulich wrote: >>>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote: >>>>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct >>>>>>>>> pci_dev *pdev) >>>>>>>>> >>>>>>>>> PCI_STATUS_RSVDZ_MASK); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev) >>>>>>>>> +{ >>>>>>>>> + unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480; >>>>>>>> >>>>>>>> The ttl value exists (in the function you took it from) to make sure >>>>>>>> the loop below eventually ends. That is, to be able to kind of >>>>>>>> gracefully deal with loops in the linked list. Such loops, however, >>>>>>>> would ... >>>>>>>> >>>>>>>>> + if ( !is_hardware_domain(pdev->domain) ) >>>>>>>>> + /* Extended capabilities read as zero, write ignore for >>>>>>>>> guest */ >>>>>>>>> + return vpci_add_register(pdev->vpci, vpci_read_val, NULL, >>>>>>>>> + pos, 4, (void *)0); >>>>>>>>> + >>>>>>>>> + while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- ) >>>>>>>>> + { >>>>>>>>> + uint32_t header = pci_conf_read32(pdev->sbdf, pos); >>>>>>>>> + int rc; >>>>>>>>> + >>>>>>>>> + if ( !header ) >>>>>>>>> + return 0; >>>>>>>>> + >>>>>>>>> + rc = vpci_add_register(pdev->vpci, vpci_read_val, >>>>>>>>> vpci_hw_write32, >>>>>>>>> + pos, 4, (void *)(uintptr_t)header); >>>>>>>> >>>>>>>> ... mean we may invoke this twice for the same capability. Such >>>>>>>> a secondary invocation would fail with -EEXIST, causing device init >>>>>>>> to fail altogether. Which is kind of against our aim of exposing >>>>>>>> (in a controlled manner) as much of the PCI hardware as possible. >>>>>>> May I know what situation that can make this twice for one capability >>>>>>> when initialization? >>>>>>> Does hardware capability list have a cycle? >>>>>> >>>>>> Any of this is to work around flawed hardware, I suppose. >>>>>> >>>>>>>> Imo we ought to be using a bitmap to detect the situation earlier >>>>>>>> and hence to be able to avoid redundant register addition. Thoughts? >>>>>>> Can we just let it go forward and continue to add register for next >>>>>>> capability when rc == -EXIST, instead of returning error ? >>>>>> >>>>>> Possible, but feels wrong. >>>>> How about when EXIST, setting the next bits of previous extended >>>>> capability to be zero and return 0? Then we break the cycle. >>>> >>>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just >>>> me, and Roger may be fine with it. IOW we might as well start out this way, >>>> and adjust if (ever) an issue with a real device is found. >>> >>> Returning -EEXIST might be fine, but at that point there's no further >>> capability to process. There's a loop in the linked capability list, >>> and we should just exit. There needs to be a warning in this case, >>> and since this is for the hardware domain only it shouldn't be fatal. >>> >> If I understand correctly, I need to add below in next version? >> >> rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32, >> pos, 4, (void *)(uintptr_t)header); >> + >> + if ( rc == -EEXIST ) >> + { >> + printk(XENLOG_WARNING >> + "%pd %pp: there is a loop in the linked capability >> list\n", > > I think we shouldn't say "loop" unless we firmly know that's what the > issue is. Maybe use "overlap" instead? And then also log the offending > register range? (As a nit: "there is" and "linked" are not adding any > value to the log message; to keep them short [without losing > information], please try to avoid such.) OK, below may be more in line with your opinion.
rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32, pos, 4, (void *)(uintptr_t)header); + + if ( rc == -EEXIST ) + { + printk(XENLOG_WARNING + "%pd %pp: overlap in extended cap list, offset %#x\n", + pdev->domain, &pdev->sbdf, pos); + return 0; + } + if ( rc ) return rc; > > Jan -- Best regards, Jiqian Chen.