On 2025/5/21 14:25, Jan Beulich wrote:
> On 21.05.2025 08:08, Chen, Jiqian wrote:
>> On 2025/5/19 21:21, Roger Pau Monné wrote:
>>> On Mon, May 19, 2025 at 03:10:17PM +0200, Jan Beulich wrote:
>>>> On 19.05.2025 09:13, Chen, Jiqian wrote:
>>>>> On 2025/5/19 14:56, Jan Beulich wrote:
>>>>>> On 19.05.2025 08:43, Chen, Jiqian wrote:
>>>>>>> On 2025/5/18 22:20, Jan Beulich wrote:
>>>>>>>> On 09.05.2025 11:05, Jiqian Chen wrote:
>>>>>>>>> @@ -827,6 +827,34 @@ static int vpci_init_capability_list(struct 
>>>>>>>>> pci_dev *pdev)
>>>>>>>>>                                                   
>>>>>>>>> PCI_STATUS_RSVDZ_MASK);
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +static int vpci_init_ext_capability_list(struct pci_dev *pdev)
>>>>>>>>> +{
>>>>>>>>> +    unsigned int pos = PCI_CFG_SPACE_SIZE, ttl = 480;
>>>>>>>>
>>>>>>>> The ttl value exists (in the function you took it from) to make sure
>>>>>>>> the loop below eventually ends. That is, to be able to kind of
>>>>>>>> gracefully deal with loops in the linked list. Such loops, however,
>>>>>>>> would ...
>>>>>>>>
>>>>>>>>> +    if ( !is_hardware_domain(pdev->domain) )
>>>>>>>>> +        /* Extended capabilities read as zero, write ignore for 
>>>>>>>>> guest */
>>>>>>>>> +        return vpci_add_register(pdev->vpci, vpci_read_val, NULL,
>>>>>>>>> +                                 pos, 4, (void *)0);
>>>>>>>>> +
>>>>>>>>> +    while ( pos >= PCI_CFG_SPACE_SIZE && ttl-- )
>>>>>>>>> +    {
>>>>>>>>> +        uint32_t header = pci_conf_read32(pdev->sbdf, pos);
>>>>>>>>> +        int rc;
>>>>>>>>> +
>>>>>>>>> +        if ( !header )
>>>>>>>>> +            return 0;
>>>>>>>>> +
>>>>>>>>> +        rc = vpci_add_register(pdev->vpci, vpci_read_val, 
>>>>>>>>> vpci_hw_write32,
>>>>>>>>> +                               pos, 4, (void *)(uintptr_t)header);
>>>>>>>>
>>>>>>>> ... mean we may invoke this twice for the same capability. Such
>>>>>>>> a secondary invocation would fail with -EEXIST, causing device init
>>>>>>>> to fail altogether. Which is kind of against our aim of exposing
>>>>>>>> (in a controlled manner) as much of the PCI hardware as possible.
>>>>>>> May I know what situation that can make this twice for one capability 
>>>>>>> when initialization?
>>>>>>> Does hardware capability list have a cycle?
>>>>>>
>>>>>> Any of this is to work around flawed hardware, I suppose.
>>>>>>
>>>>>>>> Imo we ought to be using a bitmap to detect the situation earlier
>>>>>>>> and hence to be able to avoid redundant register addition. Thoughts?
>>>>>>> Can we just let it go forward and continue to add register for next 
>>>>>>> capability when rc == -EXIST, instead of returning error ?
>>>>>>
>>>>>> Possible, but feels wrong.
>>>>> How about when EXIST, setting the next bits of previous extended 
>>>>> capability to be zero and return 0? Then we break the cycle.
>>>>
>>>> Hmm. Again an option, yet again I'm not certain. But that's perhaps just
>>>> me, and Roger may be fine with it. IOW we might as well start out this way,
>>>> and adjust if (ever) an issue with a real device is found.
>>>
>>> Returning -EEXIST might be fine, but at that point there's no further
>>> capability to process.  There's a loop in the linked capability list,
>>> and we should just exit.  There needs to be a warning in this case,
>>> and since this is for the hardware domain only it shouldn't be fatal.
>>>
>> If I understand correctly, I need to add below in next version?
>>
>>          rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
>>                                 pos, 4, (void *)(uintptr_t)header);
>> +
>> +        if ( rc == -EEXIST )
>> +        {
>> +            printk(XENLOG_WARNING
>> +                   "%pd %pp: there is a loop in the linked capability 
>> list\n",
> 
> I think we shouldn't say "loop" unless we firmly know that's what the
> issue is. Maybe use "overlap" instead? And then also log the offending
> register range? (As a nit: "there is" and "linked" are not adding any
> value to the log message; to keep them short [without losing
> information], please try to avoid such.)
OK, below may be more in line with your opinion.

         rc = vpci_add_register(pdev->vpci, vpci_read_val, vpci_hw_write32,
                                pos, 4, (void *)(uintptr_t)header);
+
+        if ( rc == -EEXIST )
+        {
+            printk(XENLOG_WARNING
+                   "%pd %pp: overlap in extended cap list, offset %#x\n",
+                   pdev->domain, &pdev->sbdf, pos);
+            return 0;
+        }
+
         if ( rc )
             return rc;

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Reply via email to